The Internet Archive Fights Wiki Citation Wars With Books

The Internet Archive Fights Wiki Citation Wars With Books

Thanks to the laborious stewardship of the blessed Internet Archive and the unyielding armies of Wikipedia’s citizen scholars, we may, within our lifetimes, reach a consensus on basic historical information.

Last week, the Internet Archive announced that it’s been filling out Wikipedia’s book citations with links to two-page previews of the scanned book, so that the cited passage can be viewed with a bit of surrounding context. If the book predates 1923, and is therefore in the public domain, you can likely see the whole thing. So far, the IA claims to have turned 130,000 references into live links from 50,000 books in English, Greek, and Arabic. They hope, in the words of WayBack Machine director Mark Graham, to “achieve Universal Access to All Knowledge.”

The Internet Archive has long been wending its way through vast tracts of Wikipedia entries; scroll down to the bottom of virtually any Wiki page, and you’ll probably find the tiny bootprints of the InternetArchiveBot, which has weeded out about 13 million rotten links and supplanted them with Wayback Machine-archived pages. According to Internet Archive founder Brewster Kahle, WayBack Machine links are now the top-clicked citation links “by a factor of three.”

I just pulled up Genghis Kahn’s Wikipedia entry and found the InternetArchiveBot’s trail all throughout the citations; every fourth link sends me to the WayBack machine, and one sends me to a two-page preview of a 1998 history of Mongols in the Internet Archive. In a perfect world, I’ll eventually be able to visit page 313 of The New Encyclopaedia of Islam without hitting a Google Books paywall.

IA’s bot scans from a collection of 3.8 million books, the director of the Internet Archive’s Open Library Chris Freeland tells Gizmodo. That collection, according to Graham, is currently being scanned by 100 paid workers at 22 worldwide locations at a rate of 1,000 books per day, with millions waiting in storage centres in California, in addition to operations out of the Getty, the Boston Public Library, the U.S. Library of Congress, and Princeton University.

The Internet Archive is also rescuing tens of thousands of books from deep storage–for example, when Phillips Academy had planned to stock away pallets of books in preparation for its library renovation, the Internet Archive swooped in and digitised the 70,000-book collection. Now Philips can offer an exclusive Internet Archive link to its own members to borrow against a hard copy.

“Librarians have been able to confidently weed excess, outdated materials from our collection,” San Francisco Public Library librarian Michael Lambert has said, “secure in knowledge that the books will not disappear, but rather have a new life where people around the world can read and research the materials that SFPL has meticulously collected over the decades.”

About 1 million of the Internet Archive’s books are modern, i.e., post-1923, and subject to copyright law. The IA’s Open Library has coined the term “controlled digital lending” (CDL), an “own-to-loan” program dictating that the number of copies the library physically owns will be proportional to the number it digitally loans. (CDL has drawn the ire of publishers and authors like the Author’s Guild, who take issue with CDL as a way to circumvent e-lending licenses, which libraries purchase from authors and publishers, and many have issued DMCA takedown notices.

“We’re really not focused on the most recent books,” Open Library director Chris Freeland told Gizmodo in a call. “Our goal is to un-blank the 20th century: books from the 1920s to the late 1990s, for which there is often no digital equivalent.” The Internet Archive is recognised by the state of California as a library and uses digital rights management software to limit sharing, downloads, and print-outs.)

Wikipedia book citations may add an extra patina of authority to term papers and blog posts, but the Internet Archive-Wikipedia partnership envisions grander plans. Last week at the Internet Archive’s Annual bash, founder Brewster Kahle spoke of the misinformation crisis during the 2016 election and segued into Wikipedia Executive Director Katherine Maher’s prophetic warning that “the truth might fracture.”

Maher had been referring to citation wars, in which fact-checkers bitterly debate that which should be self-evident (see concentration camps). “Wikipedia is built on the idea that on any particular subject, a consensus will arise,” Kahle said.

“That we’ll be pushing and shoving, but we will arrive at a consensus.” In lieu of that, the solution was to point to print, which, hopefully, America hasn’t totally written off along with TV and digital media. “We know that a lot of the best, most vetted information that we have are in things like books,” he said.

Each of those books costs the Internet Archive about $30 to acquire, digitise and store in both physical and digital forms, so you can help them out by sponsoring a book. If you want to see a book that’s not on the list, you can click on the “Want to Read” button and put that book up for sponsorship.


The Cheapest NBN 50 Plans

It’s the most popular NBN speed in Australia for a reason. Here are the cheapest plans available.

At Gizmodo, we independently select and write about stuff we love and think you'll like too. We have affiliate and advertising partnerships, which means we may collect a share of sales or other compensation from the links on this page. BTW – prices are accurate and items in stock at the time of posting.