Delete Never: The Digital Hoarders Who Collect Tumblrs, Medieval Manuscripts, And Terabytes Of Text Files

Delete Never: The Digital Hoarders Who Collect Tumblrs, Medieval Manuscripts, And Terabytes Of Text Files

When it comes to their stuff, people often have a hard time letting go. When the object of their obsession are rooms full of old clothes or newspapers, it can be unhealthy—even dangerous. But what about a stash that fits on 10 13cm hard drives?

Online, you’ll find people who use hashtags like “#digitalhoarder” and hang out in the 120,000-subscriber Reddit forum called /r/datahoarder, where they trade tips on building home data servers, share collections of rare files from video game manuals to ambient audio records, and discuss the best cloud services for backing up files.

The often stereotyped hoarders letting heaps of physical items of questionable utility dominate their homes and lives often suffer social stigma and anxiety as a result. By contrast, many self-proclaimed digital hoarders say they enjoy their collections, can keep them contained in a relatively small amount of physical space, and often take pleasure in sharing them with other hobbyists or anyone who wants access to the same public data.

“Data hoarder means to me simply someone who collects and curates digital data,” said the user -Archivist, one of the moderators of /r/datahoarder, in a private message on Reddit. “It’s a little removed from the disorder we usually see from traditional hoarders.”

He and many of his fellow subreddit users also take pride in keeping their data well organised into folders and subfolders. Some even take pains to keep the forum itself from getting bogged down with dubious material: One of the most popular recent threads begs users to stop spamming the subreddit with photos of their hard drives.

“Data hoarding isn’t about just buying $4,232 worth of hard drives just for posting them here,” wrote user Nooco24, one of the site’s moderators. “What’s interesting is what you do with your storage.”

What users seem to prefer to see are discussions of unusual and intricate storage setups, guides to using complex archive software and, of course, interesting datasets, from public-domain collections of vintage scientific papers to old BBC sound effect samples. Public archives, naturally, are a plus.

In addition to roughly 2.6 petabytes stored on a system of servers in his spare room — data collection size is the one fact each moderator highlights on the forum’s mod list — -Archivist is also the data curator and server manager of The Eye, a sprawling online archive of everything from vintage movie posters to beer-brewing guides to video games from short-lived console systems of the 1980s. A German resident in his late 20s who restores historic paintings and documents for a living, -Archivist said he got his start collecting printed and digitised medical journals.

“After that came piracy, which I was introduced to early on by my stepfather,” he quipped, leading him to start developing collections of movies and TV shows. Today, he personally prefers to collect digital books and texts, which he said are often quick to disappear from the internet.

“Most other data types aren’t so rare,” he said. “Weird and obscure books and texts seem to vanish first.”

Many people active in the data hoarding community take pride in tracking down esoteric files of the kind that often quietly disappear from the internet—manuals for older technologies that get taken down when manufacturers redesign their websites, obscure punk show flyers whose only physical copies have long since been pulled from telephone poles and thrown in the trash, or episodes of old TV shows too obscure for streaming services to bid on—and making them available to those who want them.

GitHub, owned by Microsoft since late last year, is mostly known for hosting source code for collaborative programming projects. But it’s also home to a collection of works by the Polish surrealist painter Zdzisław Beksiński uploaded by the user itdaniher, a Midwesterner and /r/datahoarder user who’s been collecting data for a decade and asked to only be identified by their username.

“I’ve been in touch with his estate a little bit, and they’re fine with me hosting a mirror of his works,” said itdaniher, who first obtained the images from a shared BitTorrent file, in a phone interview. Another file they uploaded to GitHub is a database mapping more than 2,000 common names of plants to their Latin scientific names, with entries from “Abe Lincoln Tomato” to “Zuni Gold Bean.” Itdaniher, who also enjoys gardening and doesn’t identify as a true “hoarder”—“I try to exercise a certain level of judiciousness,” they say, usually spending three or four hours a week archiving—hopes to expand the list into a larger project documenting ideal temperatures, soil and other conditions for growing the various plants. They hope to find that data scattered across the internet, just as the list of names initially was.

“The internet is a big place, and a lot of times I will find other people who have HTML tables on their web pages that have some information, but a small fraction of the information that I want,” itdaniher said. “Sometimes it’s finding personal sites where [someone’s said] here’s the list of the common and Latin names for the plants I’m growing this year.”

Itdaniher, an experienced Linux system administrator, also runs software provided by the group Archive Team to help download materials at risk of disappearing from the internet and help them make their way to the nonprofit Internet Archive.

Founded by the digital archivist and filmmaker Jason Scott in 2009, Archive Team calls itself “a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage.” Members frequently scramble to preserve aspects of internet history before they disappear as sites fade from the web. Through a mix of manual labour and distributed bots, the project has archived large swaths of sites including the classic free web host Geocities, the text-hosting platform Etherpad and the blog platform Xanga.

“Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions—and done our best to save the history before it’s lost forever,” the group says on its official site.

Itdaniher shared with Scott a collection of Tumblr postings linked from Reddit and tagged as “not safe for work” as part of a global effort to preserve adult content on the now-Verizon-owned blogging network, after the company controversially announced it would no longer allow such material. At least 344,000 archived Tumblr sites marked for deletion are en route to the Internet Archive or already uploaded where they’ll be publicly accessible, Scott said.

“I was able to contribute to that larger project of saving that aspect of internet culture for future generations,” said itdaniher.

Some /r/datahoarder users acknowledge they collect files that other people might not find interesting: HeloRising, a man in his mid-30s from the Pacific Northwest, said via Reddit PM that he’s built up a collection of high-quality digital copies of illuminated manuscripts, which he said he finds fascinating but has yet to find other uses interested in sharing. The files sometimes get posted by institutions that house and scan the medieval documents, but they’re often difficult to download and can disappear over time or live on only in obscure online archives.

“The illuminated manuscripts are unicorns,” he said. “They turn up in odd places.”

HeloRising, who has about 30 terabytes in total of data and spends five or six hours per week on the hobby, said the Reddit community has been a “treasure trove” of useful advice and information. It’s a common sentiment from users, who enjoy solidarity and support on the subreddit, where a recent comment thread filled with excitement about a newly organised collection of thousands of vintage video game manuals.

“Having a community is great,” said itdaniher. “It makes me feel like the time that I spend, I’m working towards of a common goal of not throwing things down the proverbial memory hole, the 1984 trash disposal of uncomfortable facts.”

While people with hoarding disorders are often isolated, embarrassed and overwhelmed by disorganized piles of clutter, members of /r/datahoarder tend to take pride in their digital collections and thrive on keeping them organised, whether for sharing or personal use. More than a few work in technology or simply enjoy tinkering with computers, so tweaking download scripts and data storage networks is a fun part of their hobby, not a chore. Some also share custom-crafted archiving tools and other software they’ve created on GitHub, which can serve as a portfolio for those seeking programming jobs or just a high-tech social outlet.

“With time flying, we aren’t just people archiving data together, we are more than that,” said Corentin Barreau, a 19-year-old administrator on The Eye who is nicknamed “The French Guy,” in a Twitter direct message. “Beside that, I have an affection to everything that links to collections, even IRL, I like to collect, and it’s peaceful to sort data, it’s satisfying. And the joy of people when you share something [is] worth more than everything.”

His most prized archive is a set of “family memories,” digitised from analogue photos and VHS tapes taken by his loved ones over the years. Barreau keeps local copies of the digital versions, as well as looking after cloud backups and the analogue originals.

“That’s the most exciting thing [I’ve] done, and the collection I’m most proud of,” he said.

Barreau said he doesn’t see himself as a hoarder in a negative sense, since it doesn’t negatively impact his personal life.

“It’s just a passion, like people doing sports every day, or painting,” he said with an ASCII wink.

As with other mental health issues, experts say hoarding really becomes an issue when it interferes with people’s happiness or gets in the way of everyday life. Collecting, on the other hand, can be a perfectly healthy hobby, whether people are collecting baseball cards or rare Frank Zappa MP3s.

“The collections tend to give pride and positive feelings, whereas hoarding tends to be associated with stress and disorganization,” said Gregory Chasson, an associate professor of psychology at the Illinois Institute of Technology who has studied hoarding disorder. “There doesn’t tend to be a sense of cohesion or a theme.”

And digital media’s small physical footprint means it’s harder for even disorganized files on hard drives or USB sticks to grow unmanageable and dominate spaces the way physical collections of clothes, books or other materials can.

“I walk into homes where I can’t discern where sleeping, bathing and eating takes place because of the volume of the stuff,” said Regina Lark, owner of the Los Angeles area professional organising firm A Clear Path, which helps people with physical hoarding problems. “I would imagine the uber-acquiring of digital media is not impairing the quality of your life, unless that is what you’re spending your life on, is acquiring.”

Still, problem digital hoarding, where massive collections of files, inbox messages and other digital data bring stress to their owners, isn’t unheard of, including among people who already struggle with hoarding tangible objects. Chasson said anecdotally, it’s not uncommon to see people with hoarding issues also have computer desktops riddled with icons or email accounts stuffed with unread messages. There hasn’t yet been much formal research into digital hoarding, he said. But a recent paper he coauthored does suggest a connection with physical hoarding, finding “higher levels of physical acquiring behaviours were significantly related to increased distress” when experimental subjects were falsely told a digital item from their Pinterest collections would be deleted.

“Ultimately, I think it’s tapping into the same mechanisms for a lot of people,” he said.

Both physical and digital hoarding can be motivated by the fear of permanently losing something important, even if others might think it’s easily replaceable or simply trash, said the creator of the YouTube channel I am a Compulsive Hoarder, a self-proclaimed “disposophobic” (referring to her fear of throwing out something that might prove valuable) who asked that her name not be used.

“I start thinking, but that particular article has such good information, I’m not going to find it again,” she said. “We can’t even consider the possibility we could find a better article.”

She said she has a tendency to store disorganized collections of web articles describing exercises she’s never done, foods she’s never prepared and even treatments for hoarding. Managing text messages can also be stressful, since she worries about deleting conservation histories en masse without going through each individual message. Even e-commerce can bring challenges for people with hoarding issues, she said, as websites guilt them into signing up for inbox-clogging discount newsletters they hesitate to delete or unsubscribe from.

“They get inundated about marketing emails,” she said. “Once you’re there, it’s hard to get unsubscribed, because now you’ve got FOMO.”

When old files do turn out to be valuable—like old Christmas newsletters that bring back old memories, or a wedding speech she recently unearthed and shared with a delighted friend—she has to remind herself it’s not a reason to stockpile every bit of data.

“When I found something else everyone else is so glad I kept, I really have to splash cold water on my face and tell myself, don’t let this be a reason to start saving stuff,” she said. “I don’t want to keep getting more hard drives.“

The fact is, though, it is often genuinely difficult for users without a decent amount of technical experience to find the right balance. Many systems don’t make it easy to find, organise and back up valuable files, while shunting more ephemeral data to the digital trash heap. Social networking sites are notoriously difficult to search, let alone download content from. Cloud services shut down or change policies often with little notice, said the Archive Team’s Jason Scott, like Tumblr’s about-face on erotic pictures, Google’s move to shut down social network Google+ or the venerable photo-sharing site Flickr’s recent announcement it would begin purging images from legacy free accounts with more than 1,000 pictures uploaded as of March 12.

“We have consistently been working since the mid-80s to turn every single aspect of life into a digital file in one way or another,” Scott said. “People are suddenly discovering they don’t own their data, and all your life is data.”

Archive Team sometimes finds itself effectively the last stop before data disappears from shuttering services. That means there’s often little time or desire to distinguish between trash and treasure. But many of the group’s volunteer archivists—some of whom also frequent forums like /r/datahoarder—are more inclined to find joy and pride than frustration in loading their hard drives and public online archives with as much data as they can save for posterity.

“People are like really, you’re gonna save a bunch of furry art?” Scott said. “Well, we don’t know, and we’re not going to be the ones to make that decision.”

Steven Melendez is an independent journalist living in New Orleans.


The Cheapest NBN 50 Plans

It’s the most popular NBN speed in Australia for a reason. Here are the cheapest plans available.

At Gizmodo, we independently select and write about stuff we love and think you'll like too. We have affiliate and advertising partnerships, which means we may collect a share of sales or other compensation from the links on this page. BTW – prices are accurate and items in stock at the time of posting.