Pattern Recognition and Pseudo-Security
One of my favorite shows on NPR is On The Media. Each week, the hosts examine a variety of topics related to the media, mostly in the US. I hear the show on Sunday mornings on New Hampshire Public Radio. On February 26, 2010, the show aired a story called “The Watchers.” It brought me back to my graduate school days and my academic roots in computer science, specifically in pattern recognition and machine learning.
The story was about the value of the massive amounts of data that each of us leave behind as we go about our daily electronic lives. In particular, John Poindexter, convicted of numerous felonies in the early 1990’s for his role in the Iran-Contra scandal (reversed on appeal), had the idea that the US government could use computers to troll through this data, looking for patterns. When I was in graduate school, deficit hawks were interested in this idea as a way to find people who were scamming the welfare system and credit card companies were interested using it to ferret out credit card fraud. Then George Bush became president and 9/11 occurred. Suddenly, Poindexter’s ideas became hot within the defense department.
In 2002, Bush appointed Poindexter as the head of the Information Awareness Office, part of DARPA, and Poindexter pushed the agenda of “total information awareness,” a plan to use software to monitor the wide variety of electronic data that we each leave behind with our purchases and web browsing and cell phone calls and all of our other modern behaviors. The idea was that by monitoring this data, the software would be able to alert us to potential terrorist activity. In other words, the software would be able to detect the activities of terrorists as they plan their next attack.
The On The Media story described the problems with this program, problems that we knew about way back when I was in graduate school in the early 1990’s. The biggest problem is that the software is overwhelmed by the sheer volume of data that is currently being collected. This problem is similar to the problem of information overload in humans. The software can’t make sense of so much data. “Making sense” of the data is a prerequisite for being able to find patterns within the data.
Why do we care about this issue? There are a couple of reasons. The first is that we’re spending a lot of money on this software. In a time when resources are scarce, it seems crazy to me that we’re wasting time and money on a program that isn’t working. The second reason is that data about all of us is needlessly being collected and so our privacy is potentially being invaded (if anyone or any software happens to look at the data). Poindexter’s original idea was that the data would be “scrubbed” so that identifying information was removed unless a problematic pattern was identified. This particular requirement has been forgotten so that our identifying information is attached to each piece of data as it is collected. But I think the main reason we should care about this wasted program is because it is another example of security theater, which I’ve written about before. It does nothing to make us actually safer but is instead a way of pretending that we are safer.
When I was in graduate school, I would never have thought that we would still be talking about this idea all these years later. Learning from the past isn’t something we do well.