Thursday, March 13, 2014

Open Data and the Role of Government

In my last post, I made the point that government agencies should refer to, and use, official government sources for primary law, where available. The comments, including those by Tom Bruce of Cornell's LII, and Annette Nellen of San Jose State University, underscore the important role that third-party sites like LII play in disseminating information obtained from official sources. The presence and widespread use of these sites raises the question: what is the proper role of government in publishing and disseminating primary law online.

For many years, third-party sites provided the only accessible, structured, source for certain primary law, and they still often provide the best sources or interfaces for this information.* And for years the open government community has pointed out the failings of government sites and many of the areas we have wanted to see improvement. Examples include this excellent post from Cato's Jim Harper a year ago today, this congressional testimony from Sunlight's Dan Schuman, and the excellent work by Hudson Hollister and the Data Transparency Coalition to pull such policy recommendations together (pdf).

In no small measure due to this public concern, government entities have become better online publishers of their own official documents. As I noted in my last post, the U.S. Code, now published in XML by the Law Revision Counsel, has come a long way since the days that it was updated on a 6 year schedule (still the case for the official print version). Other government electronic sources for primary law are also much improved: regulations.gov for rulemaking, congress.gov for bills, statutes and other legislative information, both improve on their aging predecessors.

Where should these sites stop and allow the private sector to take over? Is publishing bulk XML enough?

My view is that government must go beyond publishing bulk structured data. I believe that government should provide an official online source for primary law that includes structured data (XML) presented with modern web features, including:
  • hyperlinked citations, with unique identifiers at the paragraph or section level
  • dynamic navigation of contents (e.g. navigation through tables of contents)
  • full text search
In addition, I believe that an accurate and navigable point-in-time view of the law-- a kind of version control-- should also be included where possible. This would allow us to see the law as it was in force at any date. It may be unrealistic for some data sources to create this kind of record for historical documents, but document drafting processes going forward should include some kind of version control.

What do you think government's role is in publishing primary law? In particular, how important do you think web features such as navigation and search are for the official government version?


* In this category, in addition to LII and the federation of LII sites around the world, I'd include Tim Stanley's original findlaw and now Justia, Carl Malamud's public.resource.org; Josh Tauberer's GovTrack, OpenCongress and other Sunlight Foundation sites, Waldo Jaquith's work at statedecoded.com for state codes and statutes; weblaws.org; Xcential's own legisweb.com and many more.