Friday, November 19, 2010

Pitfalls of Open Government Data Licensing

This week, a new columnist joins the OSBR: Jordan Hatcher is a lawyer, academic, and entrepreneur working on Intellectual Property and Internet law issues in the UK and worldwide. He writes:
Drafting new open licenses is not something to be taken lightly. The recent announcement of the Italian "Open" Data License (IODL) drives this point home quite effectively. I use quotation marks around "open," because this new license is not open and should be avoided for open data licensing. While the group behind the IODL are to be applauded for taking the initiative for making more government data accessible, the story of the IODL's development offers a topical lesson for the open licensing community.
A Tale of Two Open Government Initiatives
The UK recently announced a new Open Government Licence (OGL) for a broad range of public information. The UK government crafted this license only after examining existing open licensing solutions such as Creative Commons and Open Data Commons and concluded that they required, based on their needs, to draft their own legal tool. Why? Their reasons are their own, but my understanding is that this is because they wanted a single license for both data and content and CC licenses didn't fit due to issues with database rights and Open Data Commons only applies to databases. This is a sensible reason from my perspective. The UK government went about drafting their new license in the right way -- by consulting many people in the open licensing community and gathering input from experts -- and the resulting document makes for an excellent example of how to go about this process. The OGL complies with the Open Definition, which is an important standard for defining the rights behind openness, and is effectively an "attribution-only" style of license. The OGL applies to a broad range of information produced by the government, but specifically takes into account some of the unique situations that come up with open (government) data.
The recently announced Italian Open Data License is neither actually open as the Open Definition defines it -- it contains a non-commercial restriction clause -- nor does it appear to have been drafted with a great deal of attention to the specific problems of open data. Data and databases are a bit like software and a bit like content in terms of what users do with them, and data/databases have some unique legal rights, particularly in Europe where we have the Database Directive. This means that open licenses in this area should take into account the particular legal and technical challenges of open data when addressing this area.
This problem doesn't arise just in newer, greenfield, areas of open licensing such as open data. It's a problem that has a history in software as well.
License Pollution
Drafting workable open licenses is hard work, but the good licenses make it look easy, which is perhaps why so many people take on the task of writing their own terms. These range from the practically public domain WTFPL, and all the *ware licenses: sisterware, catware, beerware, tacoware, all the way to much more restrictive and complicated homegrown licenses.
By tweaking a term, adding an addendum, or contributing a clause, you create a new license. A perfect example is the so-called BSD/MIT group of licenses, where there are limitless variants. Many of these are relatively innocuous; these licenses weren't drafted for mass consumption and so each time you change the "licensor" it produces a variant. These generally get lumped as "BSD-style" licenses. However the temptation is great to add just an extra clause or restriction on to these licenses (since you're tinkering with them anyway...).
Each change increases the chance that you are defeating legal interoperability as opposed to technical interoperability. This means that you may have the perfect technical solution -- the best dataset or functioning code for the job -- but the license doesn't allow its use. We inadvertently build "license silos", even within the open licensing community that prevents use and reuse between licensed content, code, and data. This situation frustrates both lawyers and techies alike.
The Pollution Solution
Thankfully there's an easy way to avoid license pollution and thus license silos: use an existing open, public license. Public licenses -- licenses drafted for mass use and often maintained by a host organisation -- offer many advantages:
  • an upgrade path for bugs and for changes in the law or prevailing practice, such as GPLv2 to GPLv3
  • public comment periods, allowing for open source style "all bugs are shallow" development for the open licenses themselves to come into play
  • communities of users that help each other define common practice and approaches. Eben Moglen often describes the GPL as the "constitution of the free software community," and constitutions only become living, working documents through active participation. Using an existing open public license taps into that and helps your business.
  • increased user uptake: using existing solutions simply makes it easier for your users as they no longer have to stop and invest significant resources into figuring out a new license.
  • less cost: rolling your own and doing it right costs money and time. Take advantage of someone who has already done the hard work for you.
All of these reasons aren't all that different from open source software development itself. Somehow I think we end up with a blindspot for these very same advantages when looking at open licensing. Self-drafting an open license should be a court of last resort, but sometimes it is a valid and justifiable option, such as with the UK's Open Government Licence. However, instead of having all of the UK's hard work and good drafting only be used within the UK, should we instead be looking to roll this document as a template across Europe? It certainly would help initiatives such as those in Italy avoid the pitfalls of lone open license development.

No comments: