IBM’s Terrorist-Hunting Software Raises Troubling Questions

Last week, Defense One published an article about a new use that IBM is pioneering for its data-crunching software: identifying potential terrorists in the stream of refugees entering Europe. As with so many stories of new technologies being deployed for law enforcement or counterterrorism purposes, the article leaves the reader with more questions than answers.

IBM publicly unveiled the software — called i2 EIA, for Enterprise Insight Analysis — in late 2014, plugging it as a big data tool that can “uncover hidden criminal threats buried deep inside massive volumes of disparate corporate data.” It was marketed as a cybercrime sleuth, an “always-on” detective that could boost the “ability of investigators to find that illusive [sic] needle in a haystack that helps them detect a cyber attack.”

According to Defense One, however, the software may be getting repurposed. There is concern that some of the refugees entering European countries — particularly men of fighting age — may be ISIS fighters seeking to slip in undetected. “IBM believes the tool could help governments separate real refugees from imposters, untangle terrorist cells, or even predict bomb attacks,” Patrick Tucker reports. In one experiment, IBM’s program crunched data from a variety of sources, including casualty lists and the black market for passports, to assign a “risk score” to individual immigrants; a higher risk score would flag an individual for closer scrutiny.

The article is challenging to unpack; it is credulous about the abilities of IBM’s software, and it may be making grander claims for IBM’s aspirations than the company itself makes. Either way, it raises several questions that warrant deeper analysis. How could the veracity of a risk score be challenged? Is IBM also, as Defense One suggests, trying to predict bomb attacks? And is i2 EIA the latest iteration of the collect-it-all mentality?

First, the risk score. To IBM’s credit, the company emphasized that the score (hypothetical as of now) would only be the start of analysis, not the end. Nevertheless, if it were rolled out, would there be a mechanism for refugees to challenge either the underlying data or the conclusions drawn? (Note that there are also significant questions about who would actually be using the risk score, but the available information is too vague to make accurate guesses.) Disputing the score would require knowing what data went into producing it, but the history of similar types of software suggests that it is highly unlikely that either IBM or its government clients will make the scoring process transparent.

The best analogy may be to “predictive policing” software, which is increasingly in vogue with US police departments. Generally speaking, predictive policing seeks to forecast where crime is going to occur or who is likely to commit a crime, and to deploy officers in response to those forecasts. One of the chief criticisms of predictive policing is that the programs are almost entirely opaque; even when companies designing the software reveal the kinds of information they use, the algorithm itself is a black box. This intentional obscurity makes it close to impossible for an outside observer to audit the fairness of the algorithm or for a targeted community to independently evaluate the legitimacy of police officers’ increased presence in their neighborhood.

A software system targeted at identifying potential terrorists is highly likely to be cloaked in secrecy as well. If so, it may be a nearly impossible task to challenge the score that it spits out. And if the score is used to make a decision about who stays and who goes, that inscrutability may have life-or-death consequences.

Second, what exactly is the software intended to show? As noted above, Tucker contends that in addition to identifying individuals for extra investigative attention, IBM believes the software could “predict bomb attacks.”

The description of the company’s actual experiment suggests something slightly less sci-fi: not a terrorism forecaster, but an attempt to find networks of associates related to a phone number used to set off a bomb via text message. If IBM is hoping its software program could actually predict upcoming terrorist attacks, however, that possibility has been pretty thoroughly debunked. One major study commissioned by the Defense Department concluded that “there is no credible approach that has been documented … to accurately anticipate” terrorist threats. This is because there have been so few terrorist attacks that there is no reliable terrorism “signature,” and no way to establish a common pattern.

This is quite unlike instances of credit card fraud, which occur in large numbers and create a robust pattern that can be divined — including by IBM’s software — within the massive volume of credit card transactions. As security expert Bruce Schneier explains, “Terrorist plots are different, mostly because whereas fraud is common, terrorist attacks are very rare. This means that even highly accurate terrorism prediction systems will be so flooded with false alarms that they will be useless.” Until bomb attacks become as common as credit card fraud, using software to predict a bomb attack will be an expensive exercise in futility.

Finally, the article puts forward a stark proposition in support of bulk collection: “the more data i2 EIA gets, the more helpful it becomes.” But is that really true? While this contention is intuitively appealing, it turns out that it is not always the case.

This has been borne out tragically in several real-life scenarios. Take the “underwear bomber,” who was narrowly thwarted in his attempt to bring down a Detroit-bound plane in 2009 by quick-thinking fellow passengers, not by crack intelligence work. A White House review of the incident concluded that the problem was not a dearth of information but an overabundance, and that much of the key information was hidden in a “large volume of other data,” making it difficult to discern. The FBI’s inability to identify the threat posed by Major Nidal Hasan, who fatally shot 13 people at Fort Hood in 2009, was also blamed in part on the “crushing volume” of information that obscured possible red flags.

Even the National Security Agency, whose computing power is unmatched, can become overwhelmed by too much data. In one document leaked last year, an analyst emphasized that “expand[ing] the haystacks while we search for the needles” actually undermines the agency’s mission; instead, he or she urged, “prioritization is key.” Similarly, in a 2012 top secret briefing, an NSA working group urged the agency to narrow its collection to avoid drowning in data: in the briefing document’s words, the agency needed to shift from “order[ing] one of everything off the menu and eat[ing] what you want” to “memorializ[ing] what you need.” Tucker cites a report from the National Academy of Sciences on the NSA’s bulk collection program in support of his claim that more is always better, but in fact that report reached a much more prosaic conclusion: More information is better if you want to answer questions about the past. When it came to fending off future acts of violence, the massive information-collection program was a dud; the President’s own Review Group concluded that the program “was not essential to preventing attacks.”

It is certainly important to ensure that the refugees entering Europe are who they say they are, and cutting-edge technology may be a part of that process. But at least in the counterterrorism context, both predictive software and mass data collection pose challenges worthy of in-depth discussion. Whether IBM is actually experimenting with these programs or the article is inflating the company’s aspirations, it is critical that we have a serious conversation about predictive analytics rather than prematurely celebrating the victory of technology over terrorism. 

About the Author(s)

Rachel Levinson-Waldman

Counsel to the Brennan Center’s Liberty and National Security Program