Genealogy is an old hobby that has had a recent surge in popularity thanks to new technology. In addition to attics and archives, ancestry sleuths can now log on to popular websites like Ancestry.com and connect with millions of other to whom they may be genetically linked.
To Columbia University computer scientist Yaniv Erlich, this wealth of genealogical information was a treasure trove of data just waiting to be mined.
"I was thinking about it, and realised we can leverage the hard work of millions of genealogists who were just interested to learn about family history," Erlich, who is also the chief science officer at the genealogy and DNA testing company MyHeritage, told Gizmodo. "You take this hard work and you have this beautiful data set about humanity that you can study."
On Thursday, Erlich's insights from that data set appeared in the journal Science, in the form of an enormous family tree of 13 million people, a picture of marriage, migration, and human connectivity going back 500 years - all woven together from millions of online genealogy profiles.
Erlich and his colleagues downloaded 86 million public profiles from Geni.com, a collaborative genealogy website owned by MyHeritage, and used graph theory to clean and organise the data. What emerged was ultimately a picture of just how interconnected we all really are. From many smaller family trees, the interconnected profiles converged into one massive family tree of 13 million people, spanning an average of 11 generations.
"It's phenomenal," Adam Rutherford, a British geneticist and author of the book A Brief History of Everyone Who Ever Lived, told Gizmodo. "In a sense, this is what we've been waiting for. We've known for a few years now how closely related we are, first theoretically using maths, then using genetic similarities. But to actually get this idea onto a real family tree populated by literally millions of people is just incredible."
So what can we learn from a family tree of 13 million people? Erlich said that, especially when combined with other data, the information could prove a valuable tool for studying things like migration, marriage, fertility, and longevity.
For example, their research found that between 1800 and 1850, for some reason people travelled farther than ever to find a mate, up to 12 miles, which was a significant distance back then. At that time, they were also more likely to marry close relatives. This fact, they hypothesized, suggests that that eventual decline in consanguineous marriage had more to do with changing social norms rather than increased ability to travel farther to find a mate.
Another data dive revealed that genes explained only about 16 per cent of the longevity variation seen in their data, and indicated that good longevity genes can only extend someone's life by an average of five years. Erlich said that this suggests lifestyle choices have much more weight than genes when it comes to how long you live.
The anonymized dataset is available for academic researchers to use via FamiLinx.org, a website created by Erlich and his colleagues. However, the data has its limits. For example, 85 per cent of profiles originate from Europe and North America.
"Genealogy is one of the world's most popular hobbies, and what they have done here is turn the that into a scientific resource that will be mined for decades," said Rutherford, who was not involved with this study. "It's stunning."
Geni.com's individual family trees are populated both with information amassed the old fashioned way - photographs, records - and through DNA testing. While determining how Irish or English you are based on DNA can be an imprecise science at best, the science of who you are related to is much more exact. DNA testing is able to match us to our close relatives with impressively little error. The most common mistake when matching relatives by DNA is in generations; mistaking, say, a second cousin for a third.
Erlich conceded that there could be errors in the data based on information people input from their own research and records. But, since it's collaborative, like Wikipedia, he said, eventually any errors could sort themselves out.
"I think that the most beautiful thing about this work is how you can leverage the work of so many genealogists," Erlich said, "And extract some knowledge about humanity."