Data Leak Shows Chinese Firm Compiled Data, Social Media Posts of Millions

Data storage units plugged into an IBM mainframe at the CeBIT technology conference in Hanover, Germany in 2015; used here as stock photo. (Photo: Sean Gallup, Getty Images)
Data storage units plugged into an IBM mainframe at the CeBIT technology conference in Hanover, Germany in 2015; used here as stock photo. (Photo: Sean Gallup, Getty Images)

A Chinese intelligence firm’s database on 2.4 million people — including some 50,000 Americans — was recently leaked, exposing it to researchers.

Per a Tuesday report in the Register, Fullbright University Vietnam researcher Chris Balding and Australian security researcher Robert Potter co-authored a recent paper on Beijing-based company Shenzhen Zhenhua Data Technology, whose data on millions was obtained by an Australian based firm called Internet 2.0. Balding wrote in a blog post that the leaked database was compiled from “a variety of sources [and] is technically complex using very advanced language, targeting, and classification tools.” The team argued that the data was gathered as a tool for Chinese intelligence, military, and security agencies for “information warfare and influence targeting” (i.e. exposing weaknesses of or ways to influence targeted persons or institutions).

The vast majority of what Balding and Potter said is called the Overseas Key Information Database was compiled from public sources like social media feeds, a practice called data scraping that may violate rules on some sites but is otherwise totally legal in the U.S. But the two researchers estimated between 10 per cent to 20 per cent of it was culled from non-public sources, though it had no evidence one way or the other as to whether it originated from hacks or somewhere else. Tens of thousands of profiles in OKIDB concern prominent people including everyone from politicians and military officials to businesspeople, celebrities, and criminals; the team wrote the database also contains details on infrastructure and military operations in multiple countries.

What’s less clear is whether Zhenhua’s data is particularly useful for nefarious purposes. According to the Washington Post, which reviewed portions of the database, Zhenhua markets itself as aiming to do business with the Chinese military, though there’s nothing to indicate it has secured contracts with the Chinese government. Experts consulted by the Post gave mixed signals as to whether it amounted to much more than a data scrape.

“There might be gold in there, but this is not something that’s useful enough for military or intelligence targeting,” one cybersecurity contractor for the federal government told the Post, adding Zhenhua appeared to be “aspirational” rather than effective.

Georgetown University Centre for Security and Emerging Technology senior fellow Anna Puglisi, a former counter-intelligence official specializing on East Asia, told the Post the U.S. focuses on “what’s directly tied to what military or intelligence officer, the spy-on-spy stuff like what we had with the Soviet Union” when it comes to China. But she said Chinese intelligence officials have a more “holistic” approach to open-source intelligence and “things like LinkedIn, social media — this seems like an evolution of that methodology.”

University of Canterbury in Christchurch professor Anne-Marie Brady told the Guardian that the CCP and China’s Ministry of State Security already compiles “whole books” of information on foreign targets, but what would be unusual here is “the use of big data and outsourcing to a private company.”

Some of the tools detailed in Balding and Potter’s paper include a tracking system for the U.S. Navy associating social media posts with specific ships, which also contained some (patchwork) information on naval officers.

“The data collected about individuals and institutions and the overlaid analytic tools from social media platforms provide China enormous benefit in opinion formation, targeting, and messaging,” the two researchers wrote in the paper. “From the assembled data, it is also possible for China even in individualized meetings be able to craft messaging or target the individuals they deem necessary to target.”

However, the OKIDB data didn’t include information on what it was used for. The team wrote that they could not find “direct evidence of Chinese agencies using this data to craft information warfare campaigns, messaging, anonymous account usage, or individual influence targeting.” According to the Post, Zhenhua is little-known, but claimed on its website to partner with TRS, a firm that provides big data analysis for China’s military and Ministry of Public Security. Other listed partners included big data and security hardware firm Huarong and a firm Global Tone Communication Technology, which is a “subsidiary of a state-owned enterprise owned by the central propaganda department” and claims to analyse 10 terabytes of data a day for clients.

China has built an elaborate domestic digital surveillance state involving everything from face recognition to content monitoring and censorship, but it’s not by any stretch of the imagination the only actor scraping the web. U.S. firms do too, whether it’s the incomprehensible amount of data sucked up for marketing purposes or shady face recognition companies working with police. Anyone exposed in a prior data breach could find their information resurfacing any number of other places.

“If there’s a silver lining here, it’s we can do to China what they do to us,” House Intelligence Committee member Representative Jim Himes told the Post.

“The report is seriously untrue,” a spokesperson for Zhenhua, identified only as Sun, told the Guardian. “Our data are all public data on the internet. We do not collect data. This is just a data integration. Our business model and partners are our trade secrets. There is no database of 2 million people.”

“… We are a private company,” the spokesperson added. “Our customers are research organisations and business groups.”