By now you’ve surely heard of Heartbleed, the hole in the internet’s security that exposed countless encrypted transactions to any attacker who knew how to abuse it. But how did it actually work? Once you break it down, it’s actually incredibly simple. And a little hilarious. But mostly terrifying.
You can read our overview of Heartbleed here, but in general terms it’s a flaw in something called OpenSSL, a security protocol that lets your computer and a server know they are who they say they are. It left major sites like Yahoo, Flickr, and Imgur vulnerable to data theft for years. It’s pretty scary stuff, and worth a closer look. Fortunately, it’s out there for everyone to see.
The beauty of an open-source project like OpenSSL is that anyone can look at code; there’s no way to hide anything in there on purpose. In fact, you can see precisely where Heartbleed was born and where it was fixed, even though you might not be able to make heads or tails of it.
That’s what makes it so surprising that Heartbleed went unnoticed for so long. Two years hiding in plain sight, unseen by even experienced coders. But once you cut to the heart of what went wrong, the problem is as clear as day, and hilariously simple.
Listen to my heartbeat
Heartbleed isn’t a problem with the TLS/SSL technologies that encrypt the internet. It’s not even a problem with how OpenSSL works in theory. It’s just a dumb coding mistake.
When two servers get ready to make an encrypted handshake, they perform something called a heartbeat, an act from which the bug gets its awesomely terrifying name.
Heartbeats are a way for two computers who are talking to each other to make sure the other is still alive, so that if something goes wrong during a process, it doesn’t keep going. They do this by sending data back and forth to each other.
The client (that’s you) sends its heartbeat to the server (your bank, say), and the server hands it right back. That way if something goes wrong during the transaction (e.g. if a computer literally explodes) the other one will know, because the heartbeats get out of sync. It’s like making sure that both spindles in a cassette tape are moving when you’re playing it. If one spindle stops and the other keeps going, something will break.
It’s a simple process, replicated millions of times a day all over the world. But somehow, bugged versions of OpenSSL managed to screw it up. Sean Cassidy explains it wonderfully — and in crazy depth — on his blog Existential Type Crisis. But the actual breach that’s bringing the internet to its knees happens in this tiny line of code:
memcpy(bp, pl, payload);
Hold onto your butts, this is going to get a little technical, but we’ll follow up with a clumsy metaphor to try and clear things up a bit.
Put simply (as possible), memcpy is a command that copies data, and it requires three pieces of information to do the job; those are the terms in the parentheses. The first bit of info is the final destination of the data that needs to be copied. The second is the location of the data that needs to be copied. The third is the amount of data the computer is going to to find when it goes to make that copy. In this case, the bp is a place on the server computer, pl is where the actual data the client sent as a heartbeat is, and payload is a number that says how big pl is.
The important thing to know here is that copying data on computers is trickier than it seems because there’s really no such thing as “empty” memory. So bp, the spot where the client data is going to be copied, isn’t actually empty. Instead it’s full of whatever data was sitting in that part of the computer before. The computer just treats it as empty because that data has been marked for deletion. Until it’s filled up with new data, the destination bp is a bunch of old data that’s been OK’d to be overwritten. It’s still there though.
Now ideally, when memcpy takes the data from pl and slaps it in bp, it covers up all that old, garbage data in bp. After all, payload says how big pl is, and the space at bp was created to be exactly the same size; a perfect fit. When it goes off without a hitch, everything that used to be at bp is destroyed and filled up with the pl data. And that is what gets sent back to the client: Exactly what they sent in the first place. What you’re left with is a tidy little 1:1 transaction where what goes in also comes back out.
It works great — unless payload is lying. If payload says that pl is 64 KB when it is really 0 KB, you have a problem. memcpy creates a big 64KB-sized landing strip at bp that’s full of garbage data, but then none of the old data at bp gets overwritten, because there’s nothing to replace it since pl is actually empty. In practice, that means whatever old data was sitting in bp prior to the heartbeat gets passed back to the client. Sometimes that data is harmless, sometimes it’s your banking password. Either way, it ends up somewhere it shouldn’t.
Got it? In short: lol whoops!
A clumsy metaphor
OK, that’s confusing. Here’s a simpler way to understand it, taking the code out of it entirely.
Imagine you have a whole bunch of photos, and you’re going to a store for a box to keep them in. The guy who runs the store is very stupid, and can’t count at all.
You walk into this store with 100 photos, and you slap them on the counter saying “I have 100 photos.” The owner’s eyes light up with joy. “I have a box for those!” he says. “I have a 100 photo box!” He pulls out a box from beneath the counter, and says “Here it is! Somebody left it here full of photos, but nobody needs them any more.”
Then he takes one photo out of the box, burns it, puts one of your photos in. He does this over and over until he’s out of photos to put in. At the end of that process, the box is now full of your photos, and he slides it back to you. You have your box full of your photos and, and all the old photos are destroyed. Hooray! A tidy little 1:1 exchange.
But imagine if instead of 100 photos you gave him only one. You walk up to the counter, grin a villanous grin, slap down your one photo and say “I have 100 photos.” Again, the owner has a box for you, and pulls out a box full of 100 photos that someone left there. Again, he takes a photo out of the box, burns it, and puts yours in. Then — after just one photo — he is all out of photos, and because he is very stupid and can’t count at all, he assumes this means his job is done and he slides the box back to you, with your one photo and 99 of someone else’s. He’s taken your word for it, despite all evidence to the contrary.
This means that you get to walk away with 99 photos that don’t belong to you, and maybe one of them is of a naked person! Score! Even better, this teller is so dumb he can’t even distinguish zero photos from non-zero photos. If you just say you have 100 photos and give him literally nothing, he’ll still give you a box of 100 photos that belong to someone else.
In the case of Heartbleed, those photos are bits of data. Sometimes these bits of data fit together in order to be an email, or a password, or a username. Sometimes they even fit together to be a big website’s password, a signature stamp with its name on it, and the keycode to its security system. The selection of scraps you get is random, but you can do the trick as many times as you want, and eventually you get something good. Just keep asking for boxes.
That’s what nefarious folks aware of Heartbleed can do: keep asking a server for information, over and over, until it sends back something juicy.
Tiny mistake gets a tiny fix
The fix? As simple as the mistake.
* Read type and payload length first */
if (1 + 2 + 16 > s->s3->rrec.length)
/* silently discard */
hbtype = *p++;
if (1 + 2 + payload + 16 > s->s3->rrec.length)
/* silently discard per RFC 6520 sec. 4 */
pl = p;
This chunk of code has two very simple jobs, as Sean Cassidy explains. The first is to check against zero-length heartbeats; to makes sure that when you say you are giving the server dollars that you are giving it a non-zero number of dollars. The second part makes sure you are giving the number of dollars you say you are. That’s it.
This kind of bug is common. It even has a name: a buffer overflow bug. If you’ve ever written code, you know that “forgetting to do an obvious thing to check user input that really probably will never be wrong” is one of the most common mistakes you’ll ever make. I can still remember my high school C++ teacher harping at us to verify the length of user’s input. Always. Just because, that’s why.
Fortunately, since this OpenSLL bug simple and the fix is easy to roll out, though that does very little to fix the damage that’s already been done. In the end, it all comes down to that horrible and wonderful principle of computing that we’ve all come up against at one point or another: A computer will do exactly what you tell it to do, nothing less nothing more. And because that computer is perfectly obedient and therefore also dumb as hell, you can’t afford to be.