NSA Data Mining: Three Points to Remember
What's still missing from the debate about domestic surveillance.
On Thursday, the Washington Post printed yet another above-the-fold headline pulled from leaked National Security Administration documents. Using location information from cell phones, the NSA has reportedly been collecting nearly 5 billion records a day totaling some 27 terabytes of data, by one account. The data can then be analyzed to flag unknown individuals traveling with known targets. As in previous cases, the NSA has asserted that their programs are not used to monitor Americans, but some incidental collection happens.
Certainly, the scope of the project is considerable, as is the audacity of the NSA in undertaking it. However, that the NSA has the technical means to conduct such a program should hardly come as a surprise. In fact, the private sector employs technologies that are potentially much more intrusive. Given that, one takeaway from this most recent revelation is that it is just one instance of a larger pattern of action by the NSA, which means there will almost certainly be more data-mining programs after it. As long as the technology exists to develop new surveillance programs, national security practitioners will likely employ them to some extent.
This latest revelation provides an excellent opportunity to think critically about the use of big data in national security. Below are three points to keep in mind as the debate unfolds:
1. Your movements are being watched—and that is not news.
The state of the art in extracting novel information from massive data sets has moved far beyond merely identifying associates by their concurrent locations, as the NSA reportedly did. Private companies have been using location services to sell advertising, to learn about users' movements, and to otherwise turn a profit in ways that makes the NSA's activities seem markedly less impressive. For the sake of demonstration, iOS7, the new iPhone operating system, has dramatically ramped up Apple's location services and recently started informing users of the expected duration of their commute to and from work. The impressive part: users do not need to input their home or work address; the phone will learn to identify home, office, and other favorite locations, all of which are viewable on a map that shows how often the phone is in each place. Location data is used to enhance the user experience, but also to automatically crowd-source information like traffic patterns and wi-fi hotspots.
Regardless of the sector, information on user location is becoming an increasingly popular and powerful tool, but that fact passes below the radar of many users. Cell phone monitoring is just one of many mechanisms available for obtaining troves of data on individuals' locations. E-ZPass cards, RFID tags affixed to a car's windshield to automatically pay tolls, are being surreptitiously scanned in many more places than toll plazas in New York City. According to New York Department of Transportation officials, the tags are used to monitor traffic information. While the purpose of the program is ostensibly not for law enforcement, the unannounced program is sparse on details, and users are not informed that the passes will be used to track location and travel data.
If a standard iPhone can extrapolate a user's daily work habits using the location information that it collects by default and an E-ZPass card can be shanghaied into service to monitor the location of individual vehicles, it is easy to imagine how location data could be used to generate any number of inferences about users. In fact, location-based profiling was patented back in 2002. The private sector has been mining location data for north of a decade. It is no surprise that the government also possesses that capacity.
2. The NSA is probably doing a pretty good job of self-monitoring, but oversight of data mining programs is still a really tricky problem.
Although headlines have proclaimed wrongdoing, particularly regarding an NSA internal audit that admitted to 2,776 violations of rules or court orders against surveillance of Americans during a twelve-month period running through May 2012, the truth is that those numbers are not especially significant relative to the quantity of data the agency handles. To give an idea of the scale of the violations relative to the amount of data NSA handles, government advisor David Gerwitz calculated that of the 30 quadrillion bytes the NSA processes in a day, the data it improperly records is less than a standard MP3 file. In any other context, this would be considered a marvelously successful system. As Benjamin Wittes puts it, "what this document shows is that among the billions and billions of communications the NSA interacts with every year, it has certain low rate [sic] of technical errors, many of them unavoidable, which it dutifully records and counts."
Because the rate of violations is so much lower than what would normally be expected in such a large and complex program, it appears that the NSA is making a remarkably successful effort to self-regulate. Understandably, however, this is a fairly unsatisfying demonstration of oversight in the eyes of the American public, who are asked to place their blind faith in the system. Ultimately, operational security makes it effectively impossible for the NSA to present many or all of the details of the program and its monitoring and filtering mechanisms to the public, since that could expose vulnerabilities in the program.
Furthermore, even outside actors like the FISA courts have limited policing ability over the NSA's activities. But even if they had the appropriate mechanisms, having the technical expertise to make reasonable judgments is another challenge altogether. While the NSA may welcome some regulatory entity, creating one is very tricky.
3. Data mining is a now a fact of life, so we should figure out how to implement it properly.
Given the degree to which big data analytics have been integrated into the daily lives of Americans, it becomes increasingly difficult to imagine a world without them. This is probably for the better: regardless of whether they are for targeted advertising or surveillance, these tools are enormously powerful and are essentially our only practical means of addressing the influx of information now available to all internet users. This assertion is debatable, but what is indisputable is the fact that these tools are here to stay. As long as they are available to foreign intelligence services and other non-state competitors, the U.S. intelligence community should also use the tools available to it—albeit under strict limitations—or risk losing America’s competitive edge.
Exploring a range of pragmatic and specific questions might establish a more effective foundation for future data mining programs. How can the government establish effective oversight external to the NSA that requires less blind faith from the public, but does not expose vulnerabilities in collection programs? If no compelling oversight capacity is developed, and American public opinion trends away from the use of massive data analytics in national security, how much can the intelligence community withdraw from using these tools without giving an informational advantage to competing intelligence services?
Many of these questions need more than just discussion; they require changes in law. Much of the authority for the NSA's recent surveillance programs comes from post-9/11 legislation. Similarly, in 1979, the Supreme Court ruled that metadata—basic information on the sender, recipient, location, and size of a piece of data like an e-mail message or credit card transaction—is not covered by the Fourth Amendment. In the age of data mining, is it time to reconsider this ruling?
Decision makers in the intelligence community and the American public alike have a lot to chew over, but ignoring these critical considerations risks doing great damage, both to U.S. national security and civil liberties.
Laura K. Bate is a program associate at The Center for the National Interest.
Image: Wikimedia Commons/Victorgrigas. CC BY-SA 3.0.