The Prime Minister wants Tim Berners-Lee to open up government data:
Sir Tim Berners-Lee has told the BBC that the job he has been given by Gordon Brown is an important one that goes beyond party politics. The Prime Minister told MPs on Wednesday that he was asking the creator of the web “to help us drive the opening up of access to Government data in the web over the coming months.”
And TBL wants suggestions from bloggers:
Sir Tim refused to be specific about which items of data he would help make public, but said he was interested in suggestions from bloggers about what they would like to see.
With that in mind, I’d like to suggest the following:
(1) Use open data formats.
By this I mean formats which:
- have a standard published under an open content license (e.g. the public domain or a creative commons license). Bizarrely, many standards cannot be openly distributed for copyright reasons — it’s against the law to tell people the standard!
- are not encumbed by patents
- have an open source reference implementation. This is important because (1) standards are often complex and a reference implementation helps to document the standard. (2) people can use the reference implementation to read/write data in the standard. Note the reference implementation must have all the functionality that the standard specifies, so for example if it’s a standard for word processing, the reference implementation must allow you to create documents, and not just view them. And the reference implementation mustn’t be crippled, e.g. to ensure people buy a commercial implementation instead of the reference one.
For office documents, the relevant open data format is OpenDocument Format (ODF).
Note that open file formats are a direct threat to Microsoft, because they lose vendor lock-in. So Microsoft have attempted to muddy the waters with their rival OOXML data format which pretends to be open but is actually designed to be deliberately so complex that only Microsoft can ever produce software that implements it in full. Microsoft have also implemented ODF in the latest version of Microsoft Office, however their implementation is deliberately incompatible with everyone else’s.
So in practice, using ODF means saying goodbye to Microsoft Office and vendor lock-in. Which is good in itself, because if the government stopped using Microsoft Office and migrated fully to Open Office, they’d save enormous amounts of money in license fees. And the private sector would save money too — the only reason many businesses use Microsoft Office is to be compatible with everyone else, so if the government started switching to ODF, many businesses would, and when they did, everyone else would.
(2) Don’t restrict this just to central government.
Opening up data should also apply to local government and devolved assemblies. Data from agencies at arms length from central government (so-called quangos) must also be included; because if they are not, there will be a temptation to hide things that might be embarrassing by putting them there. (The same applies to charities that are mostly funded by the state.)
In particular there are two agencies whose data should be opened up: Ordnance Survey, and the BBC.
(3) What license should the data be released under?
One possibility is that all this data should be put into the public domain. That’s a nice, simple solution.
But perhaps the government could get more value out of the data by releasing it under a CC BY-SA license, which requires that all licensees “share alike” by putting derived works under the same license.
Some of this data — particularly Ordnance Survey mapping data, and the BBC’s back catalog of TV and radio programmes — is of considerable commercial value. This value could be used to help UK Internet companies, by licensing the data two ways: everyone can use it under the CC BY-NC-SA license, which disallows commercial usage, and UK-based Internet companies are also allowed to use it commercially; this would give UK-based companies a competitive advantage over the rest of the world.
(4) Include detailed data on overnment spending.
Christopher Chantrill’s UK Public Spending website gives statistics for public spending, which you can drill down by category. But it doesn’t go into much detail, and therefore something better could be done: a much more fine-grained listing of government spending where each department and agency details everything, with each contract over a certain value (say £10,000) being listed separately. This data could all be put on a central website, but it’s more important that the raw data be made available in a standard format.
It might be thought that collating all this data would be exepensive. Done right, it wouldn’t be. All government agencies have internal accounts, using accountancy software. What needs to be done is to create a standard data format for exporting this data so that it can be used in this way, and then modifying the accounting software used by government departments so that it automatically exports the data in this format, say once a month. The exported data should be put on the department’s website in a known and defined place (for example as an XML file, produced once a month), and also all such raw data would be put on a central web server for easy access.
The necessary accounting software to do this might best be written as an open source program, so that all government agencies can use it without wasting effort or money — the same software would be used by central government, quangos, devolved and local government, and government-funded charities. And of course, if the software was well-written, it’s likely that governments and agencies outside the UK would use it too; this is good, because it would mean that any add-ons they produced for it would then be available for the UK government to use at no extra cost.