The Biggest Database in the World

James Bamford has a superb review of the new book by Matthew Aid about the US National Security Agency (NSA) in the New York Review of Books this month. What seems to be causing a stir around the intelligence research (and computing) community is the reference to a report by the MITRE corporation into a the information needs of the NSA in relation to new central NSA data repository being constructed in the deserts of Utah. The report, which is being rather speculative, says that IF the trend for increasing numbers of sensors collecting all kinds of information continues, then the kind of storage capacity required would be in the range of yottabytes by 2015 – as CrunchGear blog points out: there are “a thousand gigabytes in a terabyte, a thousand terabytes in a petabyte, a thousand petabytes in an exabyte, a thousand exabytes in a zettabyte, and a thousand zettabytes in a yottabyte. In other words, a yottabyte is 1,000,000,000,000,000GB.” However CrunchGear misses the ‘ifs’ in the report as some of the comments on the story point out. There is no doubt however, that the NSA will have some technical capabilities that are way beyond what the ordinary commercial market currently provides and it’s probably useless to speculate just how far beyond. Perhaps more important in any case, are the technologies and techniques required to sort such a huge amount of information into usable data and to create meaningful categories and profiles from it – that is where the cutting edge is. The size of storage units is not really even that interesting… The other interesting thing here is the hint of competition within US intelligence that never seems to stop: just a few months back, the FBI was revealed to have its Investigative Data Warehouse (IDW) plan. Data Warehouses or repositories seem to be the current fashion in intelligence: whilst the whole rest of the world moves more towards ‘cloud computing’ and more open systems, they collect it all and lock it down.

CIA buys into Web 2.0 monitoring firm

Wired online has a report that the US Central Intelligence Agency has bought a significant stake in a market research firm called Visible Technologies that specializes in monitoring new social media such as blogs, mirco-blogs, forums, customer feedback sites and social networking sites (although not closed sites like Facebook – or at least that’s what they claim).  This is interesting but it isn’t surprising – most of what intelligence agencies has always been sifting through the masses of openly available information out there – what is now called open-source intelligence – but the fact is that people are putting more of themselves out their than ever before, and material that you would never have expected to be of interest to either commercial or state organisations is now there to be mined for useful data.

(thanks, once again to Aaron Martin for this).

Facebook forced to grow up by Canadians

Wel, Facebook has finally been forced to grow up  and develop a sensible approach to personal data. Previously, as I have documented elsewhere, the US-based social networking site had pretty much assumed ownership of all personal data in perpetuity. However it has now promised to develop new privacy and consent rules and ways of allowing site users to chose which data they will allow to be shared with third parties.

So why the sudden change of heart? Well, it’s all down to those pesky Canucks. Yes, where the USA couldn’t bothered and where the EU didn’t even try, the Canadian Privacy Commissioner, Jennifer Stoddart, had declared Facebook to be in violation of Canada’s privacy laws. And it turns out that in complying it was just easier for Facebook to make wholesale changes for all customers rather than trying to apply different rules to different jurisdictions.

This suggests an interesting new phenomenon. Instead of transnational corporations being able to always seek out a country with the lowest standards as a basis for compliance on issues like privacy and data protection, a nation with higher standards and an activist regulator has shown itself able to force such a company to adjust its global operations to its much higher standard. This is good news for net users worldwide.

However, we shouldn’t rejoice too much: as Google and Yahoo have shown in the case of China, in the absense of any meaningful internal ethical standards, a big enough market can still impose distinct and separate policies that are far more harmful to the interests of individual users in those nations.

Tokyo Brandscaping and the SuiPo system

Brandscaping is a term used in marketing to describe the metaphorical landscape of brands (either for a particular brand, company or sector), however it is also being used by some researchers, including me, to describe the way in which brands are being infiltrated into urban landscapes, with the ultimate aim of being ‘inhabitable’ perhaps even 24/7 (see for example Disney’s move into urban development with Celebration in Florida).

Contemporary brandscaping makes use of new ambient intelligence, pervasive or ubiquitous computing technologies (‘ubicomp’) and ubiquitous wireless communications to create a landscape in which the consumer is targeted with specific messages directing them to certain consumption patterns. Such communication cans of course be two-way and provide corporations with valuable and very personal data on consumption patterns. As I’ve argued in many presentations over the last few years, ubicomp is necessarily also ubiquitous surveillance (what I call ‘ubisurv’ – hence the name of this blog!) because to work it requires locatability and addressability. Japan, and Tokyo in particular, has been the site for a number of cutting edge experiments in this regard, including the ‘Tokyo Ubiquitous Technology Project’ which embedded 1000 RFID tags which can communicate with RFID-enabled keitai (mobile phones) in upscale Ginza as well as several other pilot schemes around Ueno Park and Shinjuku.

TUTP is not all about marketing surveillance however, part of the scheme has involved ‘Universal Design’ (UD) principles, with one experiment to embed chips in the yellow tactile tiles designed to help guide sight- and mobility-impaired people around the city so that useful access information could be passed through specially-enabled walking sticks. I’m very interested in such experiments as they indicate an alternative direction for ubicomp environments which are about genuinely enabling people who are currently disabled by social and architectural norms, and creating a richer sensory landscape. They show that both surveillance and ‘scary’ technology like RFID chips can be humanised.

Unfortunately in our consumer-capitalist world (and Tokyo is the exemplary city of hyper-consumption), marketing and building brandscapes tends to take priority over enabling the excluded and the disadvantaged. But there are different ways of doing this too, which can be more or less intrusive and consensual. The other day I was talking about the growth in functionality of the Suica smart travel card system. Suica-enabled keitai can now, be used buying all sorts of things and since 2006 there have been a growing number of ‘SuiPo’ (short for ‘Suica Poster’) sites, Suica-enabled advertising hoardings that will, on demand send information to your mobile e-mail address with on particular advertising in which you are interested if you pass your Suica card or phone over a scanner placed next to the poster (see photos below)

The difference between SuiPo and the Ginza RFID scheme however is that it with SuiPo is that it is the consumer who makes the choice whether to activate any particular poster’s additional information system. In this sense it is a development of the i-Mode system in which many keitai can read information from special barcodes embdedded in magazine advertisements. It doesn’t automatically call your phone every time you pass an enabled poster, once you have signed up. Not as high-tech but slightly more consensual. However this will, of course, lead to the accumulation of a lot of data on consumption interests. This potentially generates a massive consumer surveillance tool, because it can be linked up travel patterns (your registered Suica card sends information back on where you go – I was wrong about the absolute differences between London’s Oyster and Tokyo’s Suica systems the other day) and information about consumption.

So will this potential become reality? The page on privacy and data protection on the SuiPo website (as usual the link is hidden away at the bottom of the front page!), is pretty standard stuff except for the legitimate purposes for which the data can be used once you sign up. They are, for those who don’t read Japanese, for:

  1. Sending the specific requested information to you;
  2. Improving services;
  3. Data processing and analysis;
  4. JR East’s promotional marketing; and
  5. JR East customer questionnaires.

Purposes 2 and 3 pretty much allow JR to do anything it likes with the data once you have signed up, and there is no statement as to what can or cannot be done with data once it has been ‘mined’ – analysed and transformed into more useful to the company or other organisations (corporate or state) which might want to buy or access such knowledge. ‘Ubisurv’ indeed…

A juki-net footnote

I had a conversation yesterday (not a formal interview) with Midori Ogasawara, a freelance journalist and writer who used to report on privacy issues for the Asahi Shimbun newspaper. This was mainly to set up further interviews with those who are or were involved with campaigns on surveillance and privacy issues in Tokyo. However I also managed to clarify a few of my own questions about juki-net and the opposition which it attracted.

In short, there seem to have been several objections.

  1. First of all was the objection to the idea of a centralised database, which was able to link between other previously separate databases.
  2. Secondly, there was the fact that this was the national state asserting authority over both local government and citizens. Both Local Authorities and citizens groups had argued for ‘opt-in’ systems, whereby firstly, towns could adopt their own policies towards juki-net, and secondly and more fundamentally, individual citizens could decide whether they wanted their details to be shared.
  3. The third objection was to there being a register of addresses at all. Many people saw this simply as an unnecessary intrusion onto their private lives, and in any case, the administration of welfare, education and benefits worked perfectly well before this (from their point of view) so why was such a new uniform system introduced?
  4. Next there were objections based on what was being networked. The jyuminhyo (see my summary from the other day) is not actually a simple list of individuals and where they live, but is a household registry. It might not, like the koseki, place the individual in a family line, but is still a system based on patriarchal assumptions, with a designated ‘head’ of the household, and ‘dependents’ including wives and even adult children.
  5. Finally, there was the question of the construction of an identification infrastructure. Whether or not juki-net is considered as an identification system, and it does have a unique identifying number for each citizen, and has the potential to be built on to create exactly such a comprehensive system of national identification. Lasdec, who we talked to the other day, may not approve of this, or believe it will happen, but they are only technicians, they are not policymakers and don’t have the power or the access to know or decide such matters. And in the end, if they are required by law to run an ID system then they will have to run it.
  6. There were, as I already mentioned, objections to the potential loss or illicit sharing of personal information. I don’t think this is intrinsic to juki-net, or indeed to database systems, but of course both databases and networks make such things easier. People are also quite cynical about promises of secure systems. Lasdec may say that that juki-net is secure, but there have been enough incidences of government data leaks in the past for people not to accept such assertions.
  7. Finally, Juki-net connects to the border, passport and visa system. The reason that foreigners will finally be included on the jyuminhyo (and therefore juki-net) from 2012 is not therefore to respond to long-term foreign residents’ requests for equal treatment but in fact to make it even easier to sort out and find gaikokujin, check their status, and deal with unofficial and illegal migrants. Groups campaigning for the rights of foreign workers (mainly the exploited South-East Asian and Brazilian factory workers) have therefore been very much involved. Of course it also makes it possible to connect the overseas travel of Japanese people to a central address registry.

I’ll be meeting Midori again soon, I hope, along with other researchers and objectors. I am also still hoping to be able to talk to officials from the Homusho (Ministry of Justice) and the Somusho (Ministry of Public Management, Home Affairs, Posts & Telecommunications), but they are are currently passing around my request to different offices and generally delaying things in the best bureaucratic traditions!

Identification in Japan (Part 2): Juki-net

As I mentioned yesterday, one of the big developments in state information systems in Japan in recent years has been the development of the jyuminkihondaichou network system (Residents’ Registry Network System, or juki-net). Very basically juki-net is a way of connecting together the 1700 (recently restructured from 3300) local authorities’ residents’ registries (jyuminhyo). These are a record of who lives in the area and where, that are held on a multiplicity of different local computer (and even still, paper) databases. Japanese government services are always struggling to catch up with massive and swift social changes, particularly the increased mobility of people, that made first the Meiji-era koseki (family registers) and then the disconnected local jyuminhyo (which were both themselves introduced to deal with earlier waves of increased social and spatial mobility) inadequate.

Operational from 2002, juki-net is restricted by law to only transmitting four pieces of personal data (name, sex, date-of-birth and address), plus a randomly-generated 11-digit unique number. Nevertheless, the system was strongly opposed and has sparked multiple legal challenges from residents’ groups who did not want to be on the system at all, and who considered the risk of data leakage or privacy violation to be too great for the system to be lawful. These challenges were combined together into one class-action suit, which finally failed at the highest level, the Supreme Court, in March 2008. The court ruled that juki-net was constitutional and there was no serious security risk in the system itself but according to some analysts did not address the possibility of mistakes being made by operatives. But this would seem to me to be a problem of data protection in general in Japan, rather than an issues that is specific to juki-net. Like Brazil, but unlike Canada and the UK for example, Japan has no independent watchdog agency or commissioner for safeguarding privacy or kojin deta (personal data), and other than internal procedures, the courts are the citizen’s only recourse. In any case, as Britain’s comparatively frequent incidence of data loss by public authorities shows, even having such a system does not necessarily make for better practice. There is in Japan, as in Britain, training and advice in data protection provided by a specialist government information systems agency.

We interviewed officials at that government agency, Lasdec (the Local Authorities Systems Development Centre) today. Lasdec also developed and runs juki-net and is responsible for the new jyuminhyo / juki-net card that enables easy access to local (and some national) services via the web or ATM-like machines at local government offices. Unsurprisingly they were quite bemused by the opposition to juki-net, which they say was based on a lack of understanding amongst citizens about what it was, and a general fear of computers and databases. They argued that many people (including one or two local authorities) had the impression juki-net was, or was planned to be, an extensive database of all personal information held by different parts of the government, or even was the basis for a new system of national identification or indeed was a new system of national identification – indeed that was the impression one got from reading both Japanese and foreign civil and cyber-liberties groups’ reports in 2002/2003 with plenty of stories of the new Japanese ‘Big Brother’ system (see the archived collection here for example).

However Lasdec argued that both ideas were incorrect. The officials recognised both that the 11-digit unique number was adapted from a previous failed identification scheme, and that juki-net could in theory become the basis for any proposed future national ID scheme, but this was prevented by the enabling law. In any case juki-net was not even the best existing system on which to base an ID system: passport, driving licence and healthcare databases all had more information and certainly information with higher levels of personal identifiability – and no-one seems to be objecting the amount of information contained on the driving licence system, for example. Juki-net has no photos or other biometric data and no historical information. Likewise the residents’ card can have a photo if the resident wishes, but this is not shared through juki-net, and in fact the card itself is entirely voluntary. In addition, only in one city has take-up of the card exceeded more than 50% of the adult population (Lasdec has detailed information on take-up but only published a ‘league table’ without percentages). You also do not lose anything by chosing not to have or use the card.

The officials at Lasdec were, as with many technical and systems engineers in both public and private sectors whom I have interviewed, far more aware of privacy, data protection and surveillance issues than most politicians and mainstream (non-technical) government officials. They did not shy away from the terms kanshi (surveillance) or kanshi shakai (surveillance society) and indeed were as critical of the unregulated spread of things like CCTV in public space as many activists. They saw themselves in fact as controllers of information flow as much as facilitators. They were committed to the minimalist model of information-sharing set out by the law governing juki-net and wanted to find always the ways that information that was necessary to be shared could be shared without the creation of central databases or the exchange of additional unnecessary information. In addition, new laws came into force (in 2006), which make the residential information more private than it was before. In fact, such local registers used to be entirely public (anyone could access them), and now they are far more restricted – this only seems to have been noticed by direct marketing firms, who of course were not 100% happy with this change.

This puts me into a strange position. I have colleagues here who have been utterly opposed to juki-net, and I have always assumed that it was in some way similar or equivalent to the UK National Identity Register / ID card scheme. However in fact, it seems very similar to the ‘information clearing house’ idea which I and others have proposed for the UK, in opposition to the enormous NIR which would seem to suck in every kind of state-held information on the citizen! In addition juki-net does not require any more information from the Japanese citizen than is already held by the state, again unlike the NIR in the UK, for which multiple new forms of information are being requested by the state and indeed there are fines, and ultimately prison sentences, proposed by law for refusal to give up or update such information. In contrast, juki-net is more like the electoral register in the UK, to which hardly anyone objects.

This all makes me wonder exactly what it is that provoked such vociferous opposition to juki-net. If it is a actually or potentially repressive surveillance system, somewhat like Barthes’ famous description of Tokyo, it is one with an empty centre; there is no ‘Big Brother’ only a rather well-meaning set of bespectacled technicians who are just trying, as they see it, to make things work better so that people don’t have to keep proving who they are every time they move to a new area. Perhaps there are particular cultural and political factors (that is after all the working hypothesis of this entire project – and perhaps in making assumptions about both systems and oppositions across borders we obscure the specifics). Perhaps it is the association of the 11-digit number with previous proposed ID schemes. Perhaps, as in Germany, in new government information systems, there are resonances with older systems of identification and control that hark back to more repressive, fascist, times. Or perhaps there is a general cynicism of successive government ‘information society’ / ‘e-Japan’ / ‘i-Japan’ strategies and initiatives, each of which promise empowerment and in practice deliver more bureaucracy. These are some questions I need to explore further with other officials academics and activists.

FBI data warehouse revealed by EFF

Tenacious FoI and ‘institutional discovery’ work both in and out of the US courts by the Electronic Frontier Foundation has resulted in the FBI releasing lots of information about its enormous dataveillance program, based around the Investigative Data Warehouse (IDW). 

The clear and comprehensible report is available from EFF here, but the basic messages are that:

  •  the FBI now has a data warehouse with over a billion unique documents or seven times as many as are contained in the Library of Congress;
  • it is using content management and datamining software to connect, cross-reference and analyse data from over fifty previously separate datasets included in the warehouse. These include, by the way, both the entire US-VISIT database, the No-Fly list and other controversial post-9/11 systems.
  • The IDW will be used for both link and pattern analysis using technology connected to the Foreign Terrorist Tracking Task Force (FTTTF) prgram, in other words Knowledge Disovery in Databases (KDD) software, which will through connecting people, groups and places, will generate entirely ‘new’ data and project links forward in time as predictions.

EFF conclude that datamining is the future for the IDW. This is true, but I would also say that it was the past and is the present too. Datamining is not new for the US intelligence services, indeed many of the techniques we now call datamining were developed by the National Security Agency (NSA). There would be no point in the FBI just warehousing vast numbers of documents without techniques for analysing and connecting them. KDD may well be more recent for the FBI and this phildickian ‘pre-crime’ is most certainly the future in more ways than one…

There is a lot that interests me here (and indeed, I am currently trying to write a piece about the socio-techncial history of these massive intelligence data analysis systems), but one issue is whether this complex operation will ‘work’ or whether it will throw up so many random and worthless ‘connections’ (the ‘six-degrees of Kevin Bacon’ syndrome) that it will actually slow-down or damage actual investigations into real criminal activities. That all depends on the architecture of the system, and that is something we know little about, although there are a few hints in the EFF report…

(thanks to Rosamunde van Brakel for the link)

Phorm philling

UK satirical magazine, Private Eye, this week brings the ludicrous Stop Phoul Play website to my attention. This is a corporate spin site devoted entirely to defending BT’s underhand and intrusive ‘Phorm’ online advertising technology against what it calls ‘privacy pirates’ who they claim are either being paid or pushed to damage BT.

Those listed as ‘piracy pirates’ include the excellent investigative IT journal, The Register, the Open Rights Group and the brilliant Foundation for Information Policy Research (FIPR), along with numerous bloggers and contributors to web forums. Now, it may be that some other corporations with rival technologies would like Phorm to fail, just as Microsoft probably enjoys it a great deal every time Google takes a PR hit (or vice-versa), but to suggest that everyone who make a criticism of Phorm is secretly part of some conspiracy against BT is frankly, either stupid paranoid.

And there are very good reasons for being critical of Phorm in the trojan-like manner of its operation and the way in which it has been tested without the consent of users. As Private Eye also reminds us, Phorm has landed the UK government in legal trouble with the EU. It hardly needs a conspiracy to make people justifiably annoyed.

This is one of the weirder exercises in PR I have seen, not least because its paranoia and promotion of conspiracies can only be damaging to BT. Thus it is no surprise to find that, according to the The Register, that it is the product of the fevered imagination of Patrick Robertson, whose previous clients include the lovely General Pinochet and former Tory MP and convicted liar, Jonathan Aitkin. So go take a look at Stop Phoul Play (while it still exists…) – it really is quite insane.

EU Telecommunications Directive in effect

From today, private lives in the UK will be a little less private, as EU Directive 2006/24/EC becomes part of national law.

Traffic data on e-mail, website visits and Internet telephone calls now have to be recorded and retained by Internet Service Providers (ISPs). Specifically, the Directive mandates the retention of: the source of a communication; the destination of a communication; the date, time and duration of a communication; the type of communication; the type and identity of the communication device; and the location of mobile communication equipment.

This is coming into force despite the fact that many countries and ISPs still object to the directive. It has to be said that many ISPs are objecting on grounds of cost rather than any ethical reason. German courts are yet to determine the constitutionality of the directive and Sweden is not going to implement it at all.

As with many of these kinds of laws, it was rushed through on a wave of emotion after a particular ‘trigger event’ – in this case, the 7/7 bombings in London in 2005. There was a whole lot of devious practice in the Council of Ministers to get it passed too – if the Directive had been considered as a policing and security matter, it would still have needed unanimity, which means that the objections of Germany and Sweden would have vetoed the Directive. Instead, it was reclassified as ‘commercial’ on the grounds that it was about the regulation of corporations, and commerical matters need only a majority vote. How convenient…

The Home Office in Britain says our rights are safe because of RIPA, which is hardly cause for rejoicing. My main concerns, apart from the fact that this is yet another moment in the gradual erosion of private life, are that:

1. police access will rapidly become routine rather than specific, and this could be extended to many other public authorities – the original drafts of the Communications Bill would have extended the right of access to such data to all RIPA-empowered organisations (which includes most public authorities);

2. the data will be used illicitly by ISP employees for criminal purposes (remember that most identity thefts are inside jobs) – the records will be a blackmailers delight;

3. there will more ‘losses’ of this data by ISPs and others who have access to it. Remember the accidental revelation of user data by AOL in the USA?

A quarter of UK databases break privacy laws

This is massively important because it is based not simply on a financial, political or even an ethical position, but on the database projects’ respect for existing law. They are simply illegal…

A new report for the Joseph Rowntree Reform Trust by a very credible largely Foundation for Information Policy Research (FIPR) team that combines engineers, lawyers, software developers, and political scientists, has concluded that a quarter of the UK public-sector databases are illegal under human rights or data protection law. It also looks at UK involvement in some European database projects and finds all of them questionable too.

The report rates the 46 databases on a traffic light system – green, amber, red – and argues that those rated ‘red’, in particular the National Identity Register and the Communications Database, and are simply unreformable and should be scrapped. This is massively important because it is based not simply on a financial, political or even an ethical position, but on the database projects’ respect for existing law. They are simply illegal, and not just massively expensive, morally questionable or politically undesirable. In fact, a quarter of all the databases were found to contravene the law and more than half were ‘problematic’ (i.e. open to challenge in court) . All of those rated ‘amber’ (29 databases) the authors argue, should be subject to independent review.

There are a number of other major recommendations, including the reassertion of the necessity and proportionality tests contained in DP law, citizens should anonymous rights to access data, more open procurement of systems, and better training processes for civil servants. The most important and radical measures proposed, and entirely correctly in my view, are those concerning the location of data and the whole nature of UK IT development. For the former, the report recommends that the default location for sensitive personal data should be local, with national systems kept to a minimum – this appears to be rather like the ‘information clearing house’ system as opposed to central databases, that we proposed in our Report on the Surveillance Society, but better worded and justified! In the latter case, the authors simply note that fewer than 30% of government IT projects succeed at a cost of 16Bn GBP per annum and that there should never be a general and aimless government IT program, rather there should only ever be specific projects for clearly defined and justified (proportional and necessary) aims.

It is an excellent report and probably unanswerable in its logic. Tellingly, The Guardian report contains no response from any government minister…