The Data Hoarders Resisting Trump’s Purge

“This is how we know about our country,” Lynda Kellam, a social scientist and data librarian, told me. “People who support the ‘drain the swamp’ mentality don’t seem to understand how much the government does.” Kellam, who’s served in leadership positions at two Ivy League universities, described the vulnerable data as “irreplaceable.” The U.S. government is one of the world’s biggest publishers, and its research on everything from carbon emissions to infant mortality is conducted on a scale that few private institutions can match. Kellam told me that she’d participated in a small data-preservation effort during Trump’s first term, but had never seen anything like the frenzy that ignited in late January, when the C.D.C. began removing information from its website. (It has since been restored by means of a temporary restraining order.) As the DOGE-led assault on civil society continued, her peers began saving files and hosting “datathons” at their universities.

Kellam was encouraged, but worried about a lack of coördination. What if the backups languished on private hard drives? What if archivists duplicated one another’s work? She created a Google Doc to centralize information about preëxisting initiatives—an archive of archives, with detailed instructions on how to contribute to each. “It was really just meant to be a place where people could go and nominate things for the End of Term crawl,” she said. Within days, though, there were more than a hundred people in the document at any given time. Kellam met with the heads of other data-librarian organizations, and together they founded the Data Rescue Project to preserve the enormous data sets that website-focussed efforts had missed. Its tracker now catalogues more than four hundred publicly accessible volunteer backups of government repositories, from the C.F.P.B.’s Consumer Complaint Database to the C.D.C.’s National Immunization Survey.

“Everything is at risk,” Sebastian Majstorovic, who administers the tracker, told me. … Now he’s teaching dozens of Americans to save data from their own rogue government. In early February, he began downloading census records from ftp.census.gov just before it went down—and ended up with only around two hundred gigabytes of data. But strangers online had grabbed other directories, and together they were able to complete the backup. “I think we’ll be surprised by how many things have been saved by people we don’t yet know, because they haven’t had a chance to give it to someone,” he told me. When I asked who they were, Majstorovic had a simple answer: “Nerds who care.”

They came, in many cases, from r/DataHoarder, a subreddit with nearly a million members devoted to preserving files. The data hoarders collect zines, manuals, family photos, old television shows, and defunct websites—just about everything digital or digitizable at risk of disappearance. Their tastes run a wide gamut…. If the community has a politics, it’s defying the corporate stranglehold on the ownership of media—even personal data, which have been increasingly corralled into proprietary clouds.

By mid-February, the Data Rescue Project was recruiting from r/DataHoarder and a few related networks. Majstorovic and others began teaching the less experienced members how to back up government data with ArchiveTeam Warrior—an app whose creators have launched a data-rescue campaign—and to upload it to a secure public repository called DataLumos. Kellam had never even heard of data hoarders before this year. But in the first week, she told me, just one of them contributed an estimated forty per cent of the uploads. They’re a largely anonymous bunch, but those willing to speak to me were all young, male I.T. professionals—normie counterparts of the DOGE tech tyros joyriding through civil society’s back end.

Thanks, in part, to these volunteer efforts, the archivists I spoke to were confident that much less government data will be permanently lost than was initially feared. But they also saw little reason for complacency. “What we don’t know is how much material has been changed,” Mark Graham, the director of the Internet Archive’s Wayback Machine, told me. His team is tabulating how many dot-gov pages with certain keywords have been modified or deleted; in the lead are “health policy,” “World Health Organization,” and “systemic racism.” Their backups are foundational to many of the more recent efforts to archive the federal web. But they’re also closer to “snapshots” than functional substitutes: What use is an archived F.D.A. finding aid if it’s been disconnected from back-end data, and doctors without coding skills can’t use it to research clinical trials?

Last week, the guerrilla archiving movement reached an important milestone, when restoredCDC.org went online. It’s a replica of the health agency’s pre-Trump website based on backups from r/DataHoarder—one that’s fully functional, with a reconstructed back end and interactive tools.

* * *

Lambert here: With all the namby pamby material on masking intact, no doubt.

* * *

The Data Rescue Project’s next priority is finding a decentralized storage solution for the data it already has. Majstorovic is working on a way to break up hundreds of terabytes into chunks small enough to share via BitTorrent, which stores files distributively among users. The result might be less vulnerable to censorship than central servers. But it would also require even more people to donate their time and terabytes. He’s encouraged by the commitment shown by volunteers who ran out of hard-drive space on a previous campaign. “They started uninstalling their games,” he told me. “I thought that was the ultimate nerd sacrifice.”

Kicker

Data Exfiltration

People

Sebastian Majstorovic

Mark Graham

NGO

Data Rescue Project

Internet Archive

Government Entity

Consumer Financial Protection Bureau

Centers for Disease Control and Prevention

Databases and Systems (Private)