The Biggest Database in the World

James Bamford has a superb review of the new book by Matthew Aid about the US National Security Agency (NSA) in the New York Review of Books this month. What seems to be causing a stir around the intelligence research (and computing) community is the reference to a report by the MITRE corporation into a the information needs of the NSA in relation to new central NSA data repository being constructed in the deserts of Utah. The report, which is being rather speculative, says that IF the trend for increasing numbers of sensors collecting all kinds of information continues, then the kind of storage capacity required would be in the range of yottabytes by 2015 – as CrunchGear blog points out: there are “a thousand gigabytes in a terabyte, a thousand terabytes in a petabyte, a thousand petabytes in an exabyte, a thousand exabytes in a zettabyte, and a thousand zettabytes in a yottabyte. In other words, a yottabyte is 1,000,000,000,000,000GB.” However CrunchGear misses the ‘ifs’ in the report as some of the comments on the story point out. There is no doubt however, that the NSA will have some technical capabilities that are way beyond what the ordinary commercial market currently provides and it’s probably useless to speculate just how far beyond. Perhaps more important in any case, are the technologies and techniques required to sort such a huge amount of information into usable data and to create meaningful categories and profiles from it – that is where the cutting edge is. The size of storage units is not really even that interesting… The other interesting thing here is the hint of competition within US intelligence that never seems to stop: just a few months back, the FBI was revealed to have its Investigative Data Warehouse (IDW) plan. Data Warehouses or repositories seem to be the current fashion in intelligence: whilst the whole rest of the world moves more towards ‘cloud computing’ and more open systems, they collect it all and lock it down.