Show sidebar

A Few Keystrokes Could Solve the Crime. Would You Press Enter?

Image credit: Wikimedia Commons

Suppose a laptop were found at the apartment of one of the perpetrators of last year’s Paris attacks. It’s searched by the authorities pursuant to a warrant, and they find a file on the laptop that’s a set of instructions for carrying out the attacks.

The discovery would surely help in the prosecution of the laptop’s owner, tying him to the crime. But a junior prosecutor has a further idea. The private document was likely shared among other conspirators, some of whom are still on the run or unknown entirely. Surely Google has the ability to run a search of all Gmail inboxes, outboxes, and message drafts folders, plus Google Drive cloud storage, to see if any of its 900 million users are currently in possession of that exact document. If Google could be persuaded or ordered to run the search, it could generate a list of only those Google accounts possessing the precise file — and all other Google users would remain undisturbed, except for the briefest of computerized “touches” on their accounts to see if the file reposed there.

A list of users with the document would spark further investigation of those accounts to help identify whether their owners had a role in the attacks — all according to the law, with a round of warrants obtained from the probable cause arising from possessing the suspect document.

What I’ve proposed is entirely hypothetical, but completely plausible. Our fact pattern is derived from a student law review note written exactly 20 years ago. But at that time, the thought experiment required a little suspension of disbelief combined with a lot of intrusion — the original essay speculated that the government could design a computer worm that could hack and then perform a search like this on millions of individual PCs. Today, the search would be much more straightforward and technically plausible: asking Google and its counterparts in cloud email and storage, from Microsoft to Yahoo to Apple, to run a trivially easy search, no hacking required. That’s because all the documents are already in those intermediaries’ possession.

But is this search, even with such a favorable setup, a good idea? And would it be, in the United States, constitutional?

Imagine that you’re a decisionmaker at Google and have received a plea from the authorities to voluntarily run the search in question. You’re arguably allowed to run it without getting into legal trouble with your users: the Google privacy policy provides for the sharing of information with law enforcement or others if Google has “a good-faith belief that access, use, preservation or disclosure of the information is reasonably necessary to … protect against harm to the rights, property or safety of Google, our users or the public as required or permitted by law.” Someone pulling together a class action against Google for daring to perform this kind of search would no doubt want to read it differently, but this is at least in the ballpark.

So, presume that the privacy policy doesn’t restrict you from agreeing to run the search, and that you believe that the authorities, who have shown you a copy of the terrorist planning document they’ve found, are not lying about the document’s provenance and their own good faith. Would you run the search?

Here are some reasons to say yes.

First, while the search is as broad as can be — it will look through billions of private files and emails made by hundreds of millions of people — it is exquisitely tailored to only return precisely matching results. No humans are involved in the first pass, so no one ever sees any of the stuff that doesn’t match the file, even as it is searched: a tree is falling in a forest for no human to hear. That’s a far cry from, say, pre-Revolutionary Redcoats tossing the dressers in American colonists’ bedrooms as they go house to house looking for contraband. So there’s no reason for users to be chilled by innocent (if embarrassing) material, as they might if this were a more traditional human search such as by police looking through file cabinets or clicking through citizens’ PC desktops or Web browsing histories. There’s no one in the world seeing anything new as a result of the search except what’s found: a list of user accounts containing an exact copy of a non-public terrorist planning document. In fact, the match would be so perfect that some might complain that the search is too limited — even a slight change to a copy of the document in question would make it no longer match its counterparts. Orin Kerr and Rick Salgado are among those who have noted the privacy-protecting value of these sorts of “file hash” searches.

Second, in this particular instance, possessing the file in question is overwhelmingly likely to be an evidentiary smoking gun — and at the very least a marker of someone worthy of further investigation. As with the development and execution of regular warrants, not everything discovered in fact indicates guilt. There can be a Law and Order-twist that makes the seemingly guilty party in fact innocent, and perhaps framed. But that could be discovered in further investigation, or come out in a subsequent trial if the authorities pressed a case nonetheless.

Third, suppose the alternative to running the search would be an investigation with no other hot leads, allowing the terrorists to remain at large and strike again. Or, perhaps more likely, it turns out that the alternative ways to garner leads carry their own significant costs to privacy, such as in-depth searches and even detention of people deemed suspicious for other reasons — people who turn out to be innocent. Shouldn’t we weigh the costs of other more traditional forms of investigation, if they can be pursued, against the broad search with promisingly narrow results that could be run here?

Fourth, this kind of voluntary search by the likes of Google is not unprecedented. Google scans outgoing Gmail messages to see if there are attachments whose hashes — that is, digital fingerprints — match those of known images of child pornography. When this first became widely known to the public, the company explained: “Sadly all Internet companies have to deal with child sexual abuse … It’s why Google actively removes illegal imagery from our services—including search and Gmail—and immediately reports abuse … This evidence is regularly used to convict criminals.”

Google’s statement related to its broad-based searches for child pornography in otherwise-private communications — apparently a search of every one of the roughly 10 billion emails sent by its users every day — also makes clear that Google currently does not scan for anything else: “It is important to remember that we only use this technology to identify child sexual abuse imagery, not other email content that could be associated with criminal activity (for example using email to plot a burglary).”

To be sure, child pornography filtering — and reporting — may be a place to draw a clear line. Not only is child pornography near-universally reviled and banned, but the matching algorithm for previously-identified images boasts no false positives, and, perhaps most important, possession of the file is not only clear evidence of the crime, but quite typically the crime itself. A terrorist to-do list is primarily only evidence, not itself a crime. (Which can make you wonder: If a criminal suspect were to share the document with his attorney as part of assessing his defense, part of a communication arguably privileged against legal discovery, how would we feel if the net-wide search picked it up?)

And it’s worth noting that, independent of any search for child pornography, Facebook and other social networking services perform automated analyses on private messaging taking place through their services to detect adults pursuing a child for abuse. If the automated analysis flags a message, Facebook staff investigate further, and may report what they find to authorities.

Companies treat child pornography and abuse in a special category. But the precedent remains, and it could be tempting to extend at least the exact-match kind of search to the Paris terrorist example. In our hypothetical, the incriminating document is as uniquely associated with wrongdoing as a child abuse image even though possession of the document is not the primary wrongdoing itself. A particularized search for a specific document with a unique hash is simply not the same kind of fuzzy search as AIs rummaging through private messages looking for something that, say, generally mentions committing a crime.

So far, here’s our ledger in favor of running the search: the search is finely tailored; it is likely to unmask dangerous criminals; lives could be quite plausibly saved; innocent people are unlikely to be caught up in the dragnet; and there is already some precedent for this kind of searching — and those other searches are taking place continuously rather than episodically.

Here are some reasons not to do it nonetheless.

First, you can fight the hypothetical — or at least deem it too rare to apply to any real-world situations. Perhaps you just can’t imagine a setup as clean and tempting as a terrorist group’s digital to-do list found on a suspect’s laptop, with other bad people out there possessing the same document, only a search away. If so, making a theoretical exception to a no-search policy to account for this situation has little to no upside, and opens the door for serious abuse, such as demanding a search for documents that aren’t uniquely associated with a current or impending terrorist attack, or that have already gone public and so might be within lots of unaffiliated people’s accounts. Policymaking by a single case is a bad idea — unless the case represents a meaningful pattern.

Any time examples are brought to bear, they ought to be testable to see if they really are as described, and, perhaps more important, credible evidence should be offered and maintained about the larger trend. If the authorities gain an appetite for this kind of search — and I believe they will — they should publicly share the criteria, debate them in the open, and if they decide to proceed, we should see together — government, company, and public — whether anything useful results. The procedures and boundaries for new searches like this should not be developed secretly, whether for intelligence gathering or law enforcement purposes.

Of course, this kind of transparency raises questions about long-term effects from using such broad search tools. Future criminals — at least sophisticated ones — would be put on notice of the kinds of searches that might be performed against them, and would shift to services that they anticipate won’t cooperate with the authorities they’re targeting. That’s a reason to doubt the long-term effectiveness of the tool, rather than to make its use secret.

Second, perhaps the tree-in-an-empty-forest search contemplated here should still be viewed as an unjustified intrusion upon the billions of people searched. If so, it might mean that the searches going on every day for child pornography are also ethically wrong to undertake. In 2004, Google introduced contextualized advertising within email — no human would read your private email, but Google’s servers would automatically seek out keywords that could result in relevant ads being placed next to the email as it was read. A number of privacy and civil liberties organizations strongly objected. That minor public outcry faded even as these and related practices have been the subject of at least one recent lawsuit against Google.

Some of the uproar arose from the creepiness, for lack of a more precise word, of a computer “knowing” enough of what was in a private email to be able to place related content next to it. But the letter to Google from the privacy organizations dwelt least on that concern. Instead, it broached a third worry for our list of reasons against scanning for the smoking gun: that of the slippery slope. In 2004, the descent down the hill was seen as starting with contextual ad placement; the kinds of processing necessary to the ad service could serve as an infrastructure for more expansive surveillance: “Google could — tomorrow — by choice or by court order, employ its scanning system for law enforcement purposes. … How long will it be until law enforcement compels Google into a similar situation?”

That worry was prescient: as we have seen, in the interim Google has begun scanning for child abuse images. But has the slope taken us to an undesirable place? I’d wager that a survey of the public would say “not yet,” which is why the announcements that Google and others scan communications to forestall child abuse have not resulted in public uproar against the companies. So perhaps there’s a location even further down the slope that should concern us: If a search for contraband documents expands beyond the comparatively well-bounded area of child pornography, there could be little stopping it from progressing incrementally to an Orwellian level of invasiveness. For example, to prevent claimed copyright infringement, we could see services compelled to scan private communications for musical tracks or videos, or links to that content. Facebook has at times done just that for its private messaging service. Whatever one’s views on copyright, the upside of applying the search technique there is surely lower than that of catching murderers, though the logic underlying the search may ultimately prove powerful enough to make it common.

Perhaps more worrisome, we could see companies undertaking proactive scanning looking not for exact matches of particular files but for phrases such as “assassinate the President.” If that were to happen, sending or receiving the text of this essay in an email could end up getting you flagged as a potential terrorist. (In the meantime, feel free to share.) Even worse, this kind of scanning, once commonplace, would become a go-to form of quite common surveillance, rather than a rare, powerful tool brought out in exigent and life-threatening circumstances. Not only would process-observing governments demand more and fuzzier searches for less and less vital purposes, but so too would authoritarian governments. Dissidents’ lives would become even more fraught, as would those of “regular” citizens who step just a little bit out of the mainstream.

And, to be sure, the risk of government overreach in the United States is real. After all, the US Constitution had to endure Richard Nixon — a chief executive who, along with his top aides, went to extraordinary lengths to use government resources to illegally spy on his perceived enemies and to concoct sham national security justifications for doing so. Watergate and some scandals since have resulted in a measure of reform and additional oversight of our domestic and foreign intelligence gathering institutions, but the power broad-based searches offer and the ease with which they can be conducted may, over the long haul, prove too tempting not to abuse.

*          *          *

So, if I worked for one of the big cloud email or file-sharing providers and were confronted with the “press enter” hypothetical, what would I do? At least in theory, and with some real trepidation, I’d run the search in that instance, and along with it publicly establish a policy for exactly how clear cut the circumstances have to be (answer: very) for future cases to justify pressing the enter key on a similar search. The company should refuse to run searches sought by governments that do not embrace the rule of law, nor in which the targeted document is not itself clearly tied to mass murder. The company must regularly disclose not only how many such searches were made, but also detail the nature of the documents sought to be matched once the investigation had resolved itself one way or the other, even though that information could be helpful to criminals. (Criminals, or at least any lawyers they might consult, also benefit when the law that bounds the government’s behavior is known — but secret law is still a bad idea.) Disclosure would provide a form of defense against the legitimate worries of a slippery slope, and also make the companies themselves function as a check against government’s undue use of the search’s power.

If companies were to refuse outright, could they still be lawfully ordered to conduct the search? I’ve asked Federal prosecutors about this every so often, and the reaction to the hypothetical has been one of genuine puzzlement on its legality, combined with some interest in thinking it through further, indicating that this type of search is not currently part of the investigators’ arsenal. And in the American constitutional framework the search is, at the moment, likely impermissible. The Gmail-wide search is indeed a search regardless of whether mere robots are undertaking it, and it appears to be exactly the sort of general fishing expedition that the Framers of the Constitution abhorred, and for which the Fourth Amendment’s particularity requirement (“particularly describing the place to be searched”) was written: To get a warrant, you have to plan to search a room, or a house, not an entire town. (For that matter, the child pornography scans that Google and others currently voluntarily undertake would likely also be unconstitutional for the government to mandate.) The only way around that would be to say that these sorts of “perfect” searches aren’t searches at all — a notion that Amie Stepanovich and Kevin Bankston effectively rebut in a piece making the case that “there is no distinction between exposure of information to a human and exposure of information to automated equipment controlled by humans.”

But if the need seems clear and urgent, the ability to execute the search trivial, and the civil liberties downsides near-nonexistent — especially when compared against other forms of searches — the legal water will find its level. If searches fitting our hypothetical are stymied out of the box, investigators will pursue an interpretation of the Fourth Amendment to allow an exception to its general rule, or novel theories will be constructed. US prosecutors might try to claim that service providers are providing material support for terrorism if they fail to undertake the search, once warned that they might be helping to circulate terrorists’ plans.

As so much of everyone’s private communications and work migrates into the hands of a few massive private companies, the net-wide search will become too tempting to leave alone. Exactly what makes it tempting is what makes it troubling. To admit that one would press enter should be a warning that we must erect both legal and technical barriers around the wrongful use of that powerful searching keyboard. And for those who believe in answering a firm “no” to ever pressing “enter” on searches like these, it’s important to say why.

Tags: , , , ,

About the Author

is the George Bemis Professor of International Law at Harvard Law School and the Harvard Kennedy School of Government, Professor of Computer Science at the Harvard School of Engineering and Applied Sciences, Director of the Harvard Law School Library, and co-founder of the Berkman Klein Center for Internet & Society. You can follow him on Twitter (@zittrain) and you can find his full bio here.