James Bamford has a superb review of the new book by Matthew Aid about the US National Security Agency (NSA) in the New York Review of Books this month. What seems to be causing a stir around the intelligence research (and computing) community is the reference to a report by the MITRE corporation into a the information needs of the NSA in relation to new central NSA data repository being constructed in the deserts of Utah. The report, which is being rather speculative, says that IF the trend for increasing numbers of sensors collecting all kinds of information continues, then the kind of storage capacity required would be in the range of yottabytes by 2015 – as CrunchGear blog points out: there are “a thousand gigabytes in a terabyte, a thousand terabytes in a petabyte, a thousand petabytes in an exabyte, a thousand exabytes in a zettabyte, and a thousand zettabytes in a yottabyte. In other words, a yottabyte is 1,000,000,000,000,000GB.” However CrunchGear misses the ‘ifs’ in the report as some of the comments on the story point out. There is no doubt however, that the NSA will have some technical capabilities that are way beyond what the ordinary commercial market currently provides and it’s probably useless to speculate just how far beyond. Perhaps more important in any case, are the technologies and techniques required to sort such a huge amount of information into usable data and to create meaningful categories and profiles from it – that is where the cutting edge is. The size of storage units is not really even that interesting… The other interesting thing here is the hint of competition within US intelligence that never seems to stop: just a few months back, the FBI was revealed to have its Investigative Data Warehouse (IDW) plan. Data Warehouses or repositories seem to be the current fashion in intelligence: whilst the whole rest of the world moves more towards ‘cloud computing’ and more open systems, they collect it all and lock it down.
You’re right, the size of the servers channels the attention and is fun to speculate about and represent by various images (a pile of dvds to the moon and back or whatever). Meanwhile, the main intel problem TODAY is making sense of existing data. This raises two questions, as you point out: 1) what are the security consequences of this exclusive attention to collecting and storing uninterpreted (and increasingly uninterpretable) data; 2) If some kind of datamining AI is being planned (à la IAO/TIA, the now imploded DARPA project), how is it going to be used (both by NSA and its private contractor partners)?