When hacker group Impact Team released the Ashley Madison data, they asserted that "thousands" of the women's profiles were fake. Later, this number got blown up in news stories that asserted "90-95%" of them were fake, although nobody put forth any evidence for such an enormous number. So I downloaded the data and analysed it to find out how many actual women were using Ashley Madison, and who they were.
What I discovered was that the world of Ashley Madison was a far more dystopian place than anyone had realised. This isn't a debauched wonderland of men cheating on their wives. It isn't even a sadscape of 31 million men competing to attract those 5.5 million women in the database. Instead, it's like a science fictional future where every woman on Earth is dead, and some Dilbert-like engineer has replaced them with badly-designed robots.
Those millions of Ashley Madison men were paying to hook up with women who appeared to have created profiles and then simply disappeared. Were they cobbled together by bots and bored admins, or just user debris? Whatever the answer, the more I examined those 5.5 million female profiles, the more obvious it became that none of them had ever talked to men on the site, or even used the site at all after creating a profile. Actually, scratch that. As I'll explain below, there's a good chance that about 12,000 of the profiles out of millions belonged to actual, real women who were active users of Ashley Madison.
When you look at the evidence, it's hard to deny that the overwhelming majority of men using Ashley Madison weren't having affairs. They were paying for a fantasy.
The Evidence Mounts
Nobody disputed the dramatic gender disparity in the Ashley Madison user base, including the company itself. 5.5 million profiles are marked "female" in a database of roughly 37 million people.
It's also a matter of public record that some percentage of the profiles are less than real. A few years ago, a former employee of Ashley Madison sued the company in Canada over her terrible work conditions. She claimed that she'd gotten repetitive stress injuries in her hands after the company hired her to create 1,000 fake profiles of women in three months, written in Portuguese, to attract a Brazilian audience. The case was settled out of court, and Ashley Madison claimed that the woman never made any fake profiles.
Still, there is a clause in the Ashley Madison terms of service that notes that "some" people are using the site purely "for entertainment" and that they are "not seeking in person meetings with anyone they meet on the Service, but consider their communications with users and Members to be for their amusement." The site stops short of saying these are fake people, but does admit that many profiles are for "amusement only."
Based on this evidence, we've got some clear indications that many of the profiles are fake. To find out how many, though, we have to dip into the company's non-public information, contained in the data dumps.
The question is, how do you find fakes in a sea of data? Answering that becomes more difficult when you consider that even real users of Ashley Madison were probably giving fake information at least some of the time. But wholesale fakery still leaves its traces in the profile data. I spoke with a data scientist who studies populations, who told me to compare the male and female profiles in aggregate, and look for anomalous patterns.
My analysis had to be entirely based on the profiles themselves, not the credit card data. There is no such thing as a "paid account" for women because women don't have to pay for anything on Ashley Madison. As a result, I couldn't use "paid account" as a proxy for "real," the way analysts have done with the male data. Plus, the credit card data does not list gender — so it would have been impossible to be certain of gender ratios in the credit card information anyway.
In the profile database, each Ashley Madison member has a number of data fields, including obvious things like nickname, gender, birthday, and turn-ons; but the member profile also contains data that is purely for administrative use, like the email address used to create the account, and when the person last checked their Ashley Madison inbox.
I started my search in an obvious place. Were there any patterns in the personal email addresses that people listed when they signed up? I figured that if I were an admin at Ashley Madison creating fake profiles, I would use ashleymadison.com for the email addresses because it's easy and obvious. No real Ashley Madison customer would have an Ashley Madison company email. So I searched for any email address that ended in ashleymadison.com. Bingo. There were about 10 thousand accounts with ashleymadison.com email addresses. Many of them sounded like they'd been generated by a bot, like the dozens of addresses listed as [email protected], [email protected], [email protected], and so on.
A quick comparison of men's and women's email addresses revealed that over 9 thousand of these ashleymadison.com addresses were used for female profiles, while roughly 1000 went to men or to profiles where no gender was specified.
This pattern was telling, but not damning. What it suggests is that the majority of obviously fake accounts — ones perhaps created by bored admins using their company's email address, or maybe real women using fake information — were marked female. These fakes numbered in the thousands, which is exactly what Impact Team suggested.
Next I looked for patterns in IP addresses, which can reveal the location of the computers people used to open their accounts. The most popular IP address among men and women belonged to a company called OnX, which hosted Ashley Madison's backups. That could mean a number of things, including that those were all accounts created by people working at Ashley Madison. It could also mean that there was a mass migration of data at some point and everybody's IP address was changed to Ashley Madison's host address. There were no weird gender anomalies in this data, though — about 82 per cent of these OnX IP addresses belonged to men, which is close to the percentage of men in the database.
But the second most popular IP address, found in 80,805 profiles, was a different story. This IP address, 127.0.0.1, is well-known to anyone who works with computer systems as a loopback interface. To the rest of us, it's known simply as "home," your local computer. Any account with that IP address was likely created on a "home" computer at Ashley Madison. Interestingly, 68,709 of the profiles created with that IP address were female, and the remaining 12,000 were either male or had nothing in the gender field.
That's a huge disparity. In a database of 85% men, you'd expect any IP address to belong to about 85% men. So it's remarkable to discover that about 82% of the accounts created from a "home" IP address are female. This strengthened the pattern I'd already seen with the ashleymadison.com email addresses — obviously fake accounts were overwhelmingly female, and numbered in the tens of thousands.
Another weird detail was that the most popular female last name in the database was an extremely unusual one, which matched the name of a woman who worked at the company about ten years ago. This unusual name had over 350 entries, as if she or someone else was creating a bunch of test accounts. The most popular male name, on the other hand, was Smith, followed by Jones. This matches typical name distribution in the North American population.
That said, I also found millions of unique IP addresses and emails among the women, just as there were among the men. That's exactly what you'd expect from a random batch of 37 million people. I also saw data for men and women in the "birthday" field that looked perfectly normal for a very different reason: both genders had obviously fake birth dates. Two-thirds of men and women claimed their birthdays fell in January. This is a standard sign of people picking the first month that pops up in the drop-down menu. Obviously, the actual population has birthdays falling fairly evenly during all months. But the online population, filling out forms on a sex site? Their birthdays tend to clump around the easiest month to pick on a form, and this kind of fakery is actually a sign of humanness.
Again and again, the female profiles showed patterns that suggested a disproportionate number of them were fake accounts or test accounts. Still, the numbers were only in the tens of thousands. And a lot of the other data looked relatively normal.
Where the Women Aren't
Then, three data fields changed everything. The first field, called mail_last_time, contained a timestamp indicating the last time a member checked the messages in their Ashley Madison inbox. If a person never checked their inbox, the field was blank. But even if they'd checked their messages only once, the field contained a date and time. About two-thirds of the men, or 20.2 million of them, had checked the messages in their accounts at least once. But only 1492 women had ever checked their messages. It was a serious anomaly.
The pattern was reflected in another data field, too. This one, called chat_last_time contained the timestamp for the last time a member had struck up a conversation using the Ashley Madison chat system. Roughly 11 million men had engaged in chat, but only 2400 women had.
Yet another field, reply_mail_last_time, showed a similar disparity. This field contained the time when a member had last replied to a message from another person on Ashley Madison. 5.9 million men had done it, and only 9700 women had.
What all these fields have in common is that they measure user activity. They show what happened after the account profile was created, and how an actual person used it by checking messages, chatting, or replying to messages. They measure what you might call signatures of real human behaviour. Only a paltry number of women's accounts actually looked human.
But what about that seemingly odd disparity between the numbers of women checking messages (1492), and replying to messages (9700)? Even that can be explained by looking at how actual humans use Ashley Madison.
When you log into your Ashley Madison account, you're prompted to answer messages before you visit your inbox. A dialog box pops up, suggesting that you reply to all your messages in bulk, with a canned reply like "I only reply to full messages," or "Please send me a message and photo." In other words, you can reply to several mails at the same time without ever actually checking or opening your mail. So it's easy to imagine that perhaps a few thousand real women had accounts, and replied to almost 10 thousand messages after being prompted. But only about 1500 of them ever clicked the button to open their inboxes.
Both the Impact Team and disgruntled users of Ashley Madison have called the site fraudulent, mostly because the company charged men to shut down their accounts — and then actually kept their data. I found ample evidence of this kind of fraud in the database. There were 173,838 men's accounts with the email address listed as <paid_delete>, and 12,108 women's accounts. All other data in those accounts had been retained.
It's worth noting that those 12,108 <paid_delete> women's accounts may represent the only true number we've got for women who used the site. After all, paying to delete an account is a sure sign of activity, though of course it's evidence of disengagement rather than the amorous engagement that Ashley Madison promised.
Overall, the picture is grim indeed. Out of 5.5 million female accounts, roughly zero per cent had ever shown any kind of activity at all, after the day they were created.
The men's accounts tell a story of lively engagement with the site, with over 20 million men hopefully looking at their inboxes, and over 10 million of them initiating chats. The women's accounts show so little activity that they might as well not be there.
Sure, some of these inactive accounts were probably created by real, live women (or men pretending to be women) who were curious to see what the site was about. Some probably wanted to find their cheating husbands. Others were no doubt curious journalists like me. But they were still overwhelmingly inactive. They were not created by women wanting to hook up with married men. They were static profiles full of dead data, whose sole purpose was to make men think that millions of women were active on Ashley Madison.
Ashley Madison employees did a pretty decent job making their millions of women's accounts look alive. They left the data in these inactive accounts visible to men, showing nicknames, pictures, sexy comments. But when it came to data that was only visible on to company admins, they got sloppy. The women's personal email addresses and IP addresses showed marked signs of fakery. And as for the women's user activity, the fundamental sign of life online? Ashley Madison employees didn't even bother faking that at all.
There are definitely other possible explanations for these data discrepancies. It could be that the women's data in these three fields just happened to get hopelessly corrupted, even though the men's data didn't. Or maybe most of those accounts weren't deliberately faked, but just represented real women who came to the site once, never to return.
Either way, we're left with data that suggests Ashley Madison is a site where tens of millions of men write mail, chat, and spend money for women who aren't there.
Thanks to Carlos Aguilar and Josh Laurito for tips and help analysing the Ashley Madison dataset.