FEMA Contractor Tracing Coronavirus Deaths Uses Web Scraping, Social Media Monitoring

FEMA Contractor Tracing Coronavirus Deaths Uses Web Scraping, Social Media Monitoring
To sign up for our daily newsletter covering the latest news, features and reviews, head HERE. For a running feed of all our stories, follow us on Twitter HERE. Or you can bookmark the Gizmodo Australia homepage to visit whenever you need a news fix.

On December 31, few Americans had ever heard the word “coronavirus.” It was unfathomable that in cities like Pittsburgh, the U.S. National Guard would soon be deployed to work food lines that stretch beyond sight. Few knew the emotional strain of having to physically separate themselves from a loved one for weeks on end. No one was ready to hear the economy being compared to the Great Depression again so soon.

As Americans ran in the new year with fireworks (and the time-honoured tradition of watching CNN anchors get sloshed), algorithms designed to help anticipate the kind of ramping chaos only a rapidly spreading pathogen can bring began to detect an unusual amount of health-related chatter in China, according to John Goolgasian, chief operating officer at Geospark Analytics. The surge came primarily from Wuhan, one of China’s central cities, a sprawling metroplex of more than 11 million inhabitants.

Goolgasian, whose firm uses machine learning and big data to forecast events that may be disruptive to its high-dollar clients (the U.S. military, among them) admits that, like the rest of the world, Geospark Analytics was mostly working without a flashlight. “We saw there was this pneumonia or SARS-like thing happening, so we ran some retrospective analysis and shot it out to our users on that day,” he told Gizmodo.

That analysis, titled, “The 5 Things You Need to Know,” listed among other items of interest—clashes between Chilean police and protesters and a fire near India’s Kandla Port—a SARS-like virus spreading in Wuhan, sparse details of which were offered under the curious subheading, “Pneumonia outbreak?” Previously that afternoon, Chinese authorities confirmed that 27 people were infected with a mystery “pneumonia” of “unknown origin.”

The first death attributed to the novel coronavirus, which at the time had no name, followed 11 days later.

Geospark Analytics’ product, called Hyperion, the namesake of the Titan son of Uranus (meaning, “watcher from above”), fingered Wuhan as a “hotspot,” in the company’s parlance, within hours after news of the virus first broke. “Hotspots tracks normal patterns of activity across the globe and provides a visual cue to flag disruptive events that could impact your employees, operations, and investments and result in billions of dollars in economic losses,” the company’s website says.

Whether Geospark Analytics’ private and public clients took any action based on its December 31 alert is hardly the software’s responsibility. Unlike Hyperion, many of its mortal users simply ignored the clear signs that cataclysmic event was barreling towards them, many until after panic beset the masses.

On March 21, the Department of Homeland Security awarded Geospark Analytics a $US150,000 ($250,108) contract to provide FEMA with “geospatial analysis in support of disaster survivors.” Goolgasian, who spent two decades at the National Geospatial-Intelligence Agency, the Pentagon’s mapmaker—and did a stint at the CIA, based on an introduction he gave on a 2017 panel—declined to say whether the contract relates specifically to FEMA’s coronavirus efforts.

“I can talk about what we do, but I don’t want to get into the details of the contract,” Goolgasian said.

FEMA did not respond to Gizmodo’s request for comment.

Geospark Analytics has been sucking up data on the virus from a variety of sources since the pandemic began as part of an effort to determine which counties are at the highest risk. This involves combing through millions of social media posts “and everything else around it,” Goolgasian said, as well as datasets from hospitals around the United States. “We created this living model or seven-day forecast of where the growth of the virus could be,” he said, “based on death rates and existing hospital infrastructure.”

In December, Geospark Analytics received $US250,000 ($416,846) from the Department of Defence as part of a small business research award. It had previously received taken on Air Force contracts involving “global stability, threat, and operational risk forecasting,” for a total of $US165,000 ($275,118), records show. (Somewhat confusingly, Geospark Analytics of Herndon, Virginia, is not to be confused with GeoSpark of North Potomac, Maryland, a company that focuses on cell phone location intelligence, another area of interest for the federal government. When we asked Goolgasian whether the two companies are related in any way, he was steadfast: “We are completely separate, not even close to doing the same thing.”)

In the last year, Geospark Analytics claims to have processed “6.8 million” sources of information; everything from tweets to economic reports. “We geo-position it, we use natural language processing, and we have deep learning models that categorise the data into event and health models,” Goolgasian said. It’s through these many millions of data points that the company creates what it calls a “baseline level of activity” for specific regions, such as Wuhan. A spike of activity around any number of security-, military-, or health-related topics and the system flags it as a potential disruption.

Amid the unrest in Hong Kong last year instigated by planned changes to the city’s extradition laws, for example, Hyperion alerted its users to a “significant increase in negative activity in Hong Kong.”

In a promotional blog post, Geospark Analytics explained that at the time, Hyperion highlighted certain areas in Hong Kong, where millions of anti-government protesters had gathered, with an “interactive icon” on the platform’s global map. “By clicking on this icon a user will be able to access all relevant articles and social media posts that Hyperion has identified,” it said, adding that the function “provides content related to the recent activity and allows users to take a historical look at the region going as far back as 90 days,” including “social media posts.”

Goolgasian, pressed on the privacy implications, said that monitoring social media is only a “small piece” of what Geospark Analytics does and that it pursues “more authoritative and validated” sources. Social media data is, after all, notoriously unreliable. A 2016 study, for example, found that Google prominently surfaced information about a much-discussed “cholera” epidemic in the United States in 2007 “as a result of Oprah Winfrey picking Love in the Time of Cholera as book of the month in her book club.”

“We rely more on traditional data sources and we don’t do anything that isn’t publicly available,” Goolgasian said, echoing a common refrain among data firms that fuel surveillance products by mining the internet itself. Earlier this year, CEO Hoan Ton-That of facial recognition firm Clearview AI defended his company’s aggressive web scraping by arguing he had a First Amendment right to data made public by users on social media. Several major companies, including Google and Facebook, have indicated they plan to take legal action.

 

“Whether it comes from purchasing information through APIs, through RSS feeds or web scraping, or even looking at things like state-level department of health data, we get the latest and most authoritative information,” Goolgasian said.

Goolgasian was also contacted by U.S. Senator Ron Wyden’s office on Friday. A longtime supporter of digital privacy, Wyden is working to get a handle, an aide said, on the flood of data firms approaching the government with solutions to the coronavirus. While Goolgasian did not offer any further details about Geospark Analytics’ work for Homeland Security, he was adamant that certain types of data it considers strictly off-limits: “We DO NOT process any cell data. It has been something that we have purposefully stayed away from for the reasons you are concerned about,” he wrote in an email shared with Gizmodo.

Despite downplaying social media’s role in Hyperion’s forecasts, Geospark Analytics announced last year that it established an agreement with Twitter granting it access to an “enhanced data stream.” “Adding this real-time data source to our war chest of unique data will further enhance situational awareness and instantly notify users of breaking events in the time it takes to write a tweet,” it said.

Geospark Analytics product manager Serena Kelleher-Vergantini elaborated after the announcement that by “stream” the company meant enterprise access to Twitter’s API, also known as Firehose, which she went on to describe as completely useless without Hyperion. “Needless to say, no matter how much effort we put into building the initial rules, the results were mediocre at best,” she wrote, describing Hyperion’s filters for events like “earthquakes” or “terrorism.”

“The valuable tweets were there, but they were drowning in a sea of back-and-forth tweets between people arguing (over a terrorist event), emotional tweets of people who felt like they had an experience that ‘felt like an earthquake,’ and tweets about a drink referred to as ‘the landslide.’” she said, adding: “trust us when we say, you really don’t want the Twitter firehose. What you need is a platform like Hyperion that will filter out the noise to find the Twitter data you need.”

Twitter did not yet respond to a request for comment.

Twitter has a complicated history with government contractors using Firehose to monitor its users’ speech. In 2016, for example, the platform severed ties with multiple analytics firms—effectively shuttering some of them—citing a longstanding rule against the sale of user data for “surveillance” purposes. (The decision came after intense reporting by the Guardian, Daily Dot, and other outlets, along with pressure by the ACLU.) “Using Twitter’s Public APIs or data products to track or profile protesters and activists is absolutely unacceptable and prohibited,” Twitter said at the time.

The difference between how the companies Twitter banned then—such as Geofeedia, which promoted its ability to monitor Black Lives Matter protests—and Geospark Analytics—which monitors protests in South America, Southeast Asia, and elsewhere—present their products may prove most consequential. The user data provided by Twitter in both cases is essentially the same. But whereas Geofeedia presented itself as a “surveillance” company, Geospark Analytics appears to avoid the term, even if skill at intelligence gathering is why it’s in business with the government in the first place.

While the privatisation of intelligence is nothing new, Geospark Analytics’ contract with Homeland Security comes at a chaotic moment for the agency.

Jared Kushner, presidential son-in-law and senior advisor, who has no zero experience in emergency management, was appointed to supervise response efforts at FEMA and muster the support of private industry assets. Politico reported Thursday that Kushner and his team of technocrats have taken an “all-of-private-sector” approach, tasking potentially unvetted outside advisors with solving problems related to the producing medical supplies and a lack of covid-19 testing.

“It’s a little crazy,” one advisor, reportedly brought on to assist the government, told the reporters. “It’s all hands on deck—it’s literally, who’s got the technology and data? Who can help us?”