The Biggest Database in the World

James Bamford has a superb review of the new book by Matthew Aid about the US National Security Agency (NSA) in the New York Review of Books this month. What seems to be causing a stir around the intelligence research (and computing) community is the reference to a report by the MITRE corporation into a the information needs of the NSA in relation to new central NSA data repository being constructed in the deserts of Utah. The report, which is being rather speculative, says that IF the trend for increasing numbers of sensors collecting all kinds of information continues, then the kind of storage capacity required would be in the range of yottabytes by 2015 – as CrunchGear blog points out: there are “a thousand gigabytes in a terabyte, a thousand terabytes in a petabyte, a thousand petabytes in an exabyte, a thousand exabytes in a zettabyte, and a thousand zettabytes in a yottabyte. In other words, a yottabyte is 1,000,000,000,000,000GB.” However CrunchGear misses the ‘ifs’ in the report as some of the comments on the story point out. There is no doubt however, that the NSA will have some technical capabilities that are way beyond what the ordinary commercial market currently provides and it’s probably useless to speculate just how far beyond. Perhaps more important in any case, are the technologies and techniques required to sort such a huge amount of information into usable data and to create meaningful categories and profiles from it – that is where the cutting edge is. The size of storage units is not really even that interesting… The other interesting thing here is the hint of competition within US intelligence that never seems to stop: just a few months back, the FBI was revealed to have its Investigative Data Warehouse (IDW) plan. Data Warehouses or repositories seem to be the current fashion in intelligence: whilst the whole rest of the world moves more towards ‘cloud computing’ and more open systems, they collect it all and lock it down.

FBI data warehouse revealed by EFF

Tenacious FoI and ‘institutional discovery’ work both in and out of the US courts by the Electronic Frontier Foundation has resulted in the FBI releasing lots of information about its enormous dataveillance program, based around the Investigative Data Warehouse (IDW). 

The clear and comprehensible report is available from EFF here, but the basic messages are that:

  •  the FBI now has a data warehouse with over a billion unique documents or seven times as many as are contained in the Library of Congress;
  • it is using content management and datamining software to connect, cross-reference and analyse data from over fifty previously separate datasets included in the warehouse. These include, by the way, both the entire US-VISIT database, the No-Fly list and other controversial post-9/11 systems.
  • The IDW will be used for both link and pattern analysis using technology connected to the Foreign Terrorist Tracking Task Force (FTTTF) prgram, in other words Knowledge Disovery in Databases (KDD) software, which will through connecting people, groups and places, will generate entirely ‘new’ data and project links forward in time as predictions.

EFF conclude that datamining is the future for the IDW. This is true, but I would also say that it was the past and is the present too. Datamining is not new for the US intelligence services, indeed many of the techniques we now call datamining were developed by the National Security Agency (NSA). There would be no point in the FBI just warehousing vast numbers of documents without techniques for analysing and connecting them. KDD may well be more recent for the FBI and this phildickian ‘pre-crime’ is most certainly the future in more ways than one…

There is a lot that interests me here (and indeed, I am currently trying to write a piece about the socio-techncial history of these massive intelligence data analysis systems), but one issue is whether this complex operation will ‘work’ or whether it will throw up so many random and worthless ‘connections’ (the ‘six-degrees of Kevin Bacon’ syndrome) that it will actually slow-down or damage actual investigations into real criminal activities. That all depends on the architecture of the system, and that is something we know little about, although there are a few hints in the EFF report…

(thanks to Rosamunde van Brakel for the link)

Behind the cameras

While the vast majority of those monitoring CCTV screens are probably decent people who stick within the legal and ethical guidelines (such as they are), it is worth remembering that pervasive surveillance offers unprecedented opportunities to perverts, stalkers and sex offenders. This is not just secret cameras set up by weirdo voyeurs, it is the people who work with CCTV. This was noted by Clive Norris and collaborators back in the 1990s in Britain in their work on control rooms when they reported on operators making private tapes of women they saw in the street. Yesterday, The Daily Telegraph reported on a case in the US, where two FBI agents spied on girls changing for a charity fashion show for the underprivileged. They have been charged with criminal violation of privacy, which I am glad to see is a crime in the US. But, don’t forget that behind the cameras, if there is anyone these days, is a human being and that human being has as many flaws and secret desires as anyone else.

Hip Hop Cops

Alchemist album from 2003
Alchemist album from 2003

An interesting article entitled ‘Watching Rap’ by Eric Nielson on police surveillance of hip-hop artists in the USA. It’s worth a read and has some nice analysis of the response in rap lyrics, which is a pleasant change from the concentration on mainstream film and fiction that you tend to get in Surveillance Studies. However, it is unfortunately illustrated with a lot of rather irrelevent cliched images of CCTV cameras, Banksy etc. and is rather lacking in a deeper political context. It is not as if rappers are the first group of popular cultural figures, or the first African Americans to be put under surveillance by the US state: he should perhaps have looked back at least to the Black Panthers and the FBI’s COINTELPRO program of the 1960s. This isn’t just a cultural connection: Nielson starts off with the rumours around the shooting of Tupac Shakur, whose mother was, of course, deeply involved in the Panthers… but a very worthwhile piece nevertheless.