Tuesday, June 11, 2013

UUID for Legal Text

There has been a lot of interest and I have gotten great feedback on the post about the book I'm writing with Grant about legislative data.

Data standards are always a hot topic (relatively hot-- we normalize against interest in this field in general, not against interest in the Kardashians:
).

Among the questions on data standards that have sparked interest is the question of how to assign unique identifiers to legal text. These are needed for many reasons, in a variety of contexts. The most straightforward is to be able to hyperlink to a specific subsection of a bill or law.

Some options for creating the unique identifier include:

  • A unique randomish code (e.g. based on the current  datetime)
  • A hash of the text of the section
  • A URN or URL identifier based on a standard, human-readable path to the section (e.g. us/uscode/title26/section100)
  • Some combination of the above
Version control is a very important consideration: Section 100 of title 26 may be amended and the identifier should tell us which version we're citing.  Some very technically savvy minds at the Law Revision Counsel of the U.S. House of Representatives, have suggested a combined approach with one identifier for the Code section, and one that specifies the version (e.g. the version as amended by P.L. 114-XYZ).

Another question is whether the id should itself carry information about the text. In the case of a hash, we could use a similarity-preserving hash, e.g. simhash, so that texts that are related would result in hashes that are close to each other. This might have advantages, for example, in citing to court documents. Text in one court opinion that is similar to text in another may provide useful precedent; a search algorithm could collect similar text sections based on these Simhashes.

Rather than get ahead of myself and draft out the entire chapter on unique identifiers, I'll stop here and invite your comments.
  • What is important to preserve in a unique identifier for legal texts?
  • What id schemes have proven successful in other document-based structures?
  • What would Google (or Linus Torvalds) do?

If you have Insights or connections to People With Insights-- please comment here or let me or Grant know.


Tuesday, June 4, 2013

First Commit: Legislative Data, the Book

I'm writing a book with +Grant. This may be a surprise to him. We've discussed the book, we're planning on it, we've even begun to flesh out many of the ideas in our blogs. But we hadn't said anything publicly about it until now. Grant's in Hong Kong this week for work, so I figured it's a perfect time for me to commit us publicly to this project and deal with the consequences when he's back.

By the time he's in California again, I'm hoping that expectations have grown such that we just have to bite the bullet and write. I am anticipating a reading audience of dozens, but hope for an impact on millions. And that is where I'm counting on you.  In typical esoteric policy tech fashion, I've created a +GitHub repository with our first commit.  And a wiki with my very first draft of a table of contents: https://github.com/aih/LegislativeDataBook/wiki/Table-of-Contents

We'll cover legislative data standards (e.g. Akoma Ntoso, SLIM), data format wars (html, xml, json, rdf), policy (e.g. DATA Act) and drafting decisions, positive law codification, open government and transparency, tools of the trade and more. Take a look and see what I've missed or what I've messed up.

Because it's on Github, you can make a branch, make suggestions or even a pull request. Suggest a new chapter, suggest a better title or subtitle for an existing chapter. Write a first draft or prepare to comment on our drafts (which may or may not be committed first on Github before publishing-- a lot may change after Grant reads this post). Or leave your comments here. And if you make extensive comments or edits, maybe that means that you should go ahead and write your own d#&!n book. Or join us as a co-author.

Thursday, May 30, 2013

U.S. House Legislative Data and Transparency Conference 2013

Two obvious questions:

1. What have I been up to while neglecting this blog for so long?
2. What was so important that I'm back to blog again?

I'll give away the second answer first: the U.S. House of Representatives (yes, that U.S. House) held its second annual Legislative Data and Transparency Conference. If you want to hear who was there and what happened, read the latest blogpost by +Grant Vergottini. If you've read this far, you should subscribe to his blog anyway and get a heads up on all of his posts. Go ahead, follow the link. I'll wait for you here, while I formulate a fascinating and hopeful political point about the conference and its participants for the end of this post. No peeking, do check out Grant's post first.

Welcome back.

The answer to the first question is related to the conference you just read about: I have been been buried deep in projects to redefine how legislative data is produced and consumed, joining +Grant+Bradlee Chang +Patrick Andries and +François YERGEAU at a small (soon to be not-so-small) company called Xcential. My main focus has been to work with +Eridan Otto, Francisco CifuentesDavid Vilches and others in the amazing IT team led by +Christian Sifaqui at the Chilean Library of Congress (la Biblioteca del Congreso Nacional de Chile, for those of you who are Googling en español) to build the world's most advanced web based editor for legislation. Immodest, but true.  

We're building the editor based on Xcential's LegisPro Web, which grew out of a couple of hackathons that I organized with Grant, +Charles Belle and others. The original idea was to build a tool that would make it easy for lawyers and people who care about policy to add semantic data to legislation. That's how Jim Harper at +The Cato Institute (yes, the how-i-learned-to-stop-worrying-and-love-climate-change Cato Institute) is using the editor in his ambitious DeepBills project, which you read about in Grant's post.

Chile's Library of Congress has built an impressive linked data ontology to identify hundreds of thousands of concepts (from President Eduardo Frei to government positions, such as the Minister of Public Works). Using the editor, Chile has a team of people working full time to add this rich semantic data back into published laws and Congressional records. This opens up powerful avenues for research. Like finding out which senator introduced the most bills, or supported the most legislation dealing with his home province, or had the most floor time in the senate. This goes far beyond mere computerization of legislative text to creating a rich semantic web of legislative data. As was envisioned in Grant's 2011 post.

And that brings me to my fascinating and hopeful point about the Legislative Data and Transparency Conference.  There were over a hundred people there, actively engaged, from the extremes to the middle of the political spectrum.  And they agreed.  Not on whether to publish APIs or bulk data--that's the hard stuff for this crowd (c.f. +Joshua Tauberer's  Data Maturity Model). But on the fundamental principal that our laws should be readable, shareable, searchable. It's a poor reflection on our society that we can easily do this with gossip about the Kardashians, but not so easily with the proposed Immigration law.

Even more impressive than this agreement in principle, this odd group of bedfellows are working together on concrete projects to make government, and particularly the legislative branch, more efficient and transparent:  +Hudson Hollister has brought together a powerful coalition in the Data Transparency Coalition to promote the DATA Act, which would require government spending data to be standardized and published. The U.S. House has launched a modernization project (pdf) which is bringing together the Office of Legislative Council, which drafts new laws for Congress, and the Law Revision Council, which compiles and codifies the laws. Already, this project has begun to develop modern data standards based on XML Schema that can be used for bills, laws and the U.S. Code. These standards are related to developing international standards for legislative data, like Akoma Ntoso, which is being used by the Chilean Congress.

If there is one takeaway from the recent conference for me, it is that legislative transparency now has a posse. It may not be the lobby for version control I still hope for, and it may not be as powerful as the NRA, ACLU or AARP. But this group of technicians, technocrats and politicians are making things happen-- so far without politics or polemics. And the change that this conference signals could transform how we learn about, access and participate in government. Other than that, Mrs. Lincoln, not much happened at the conference.

Corrections:
This post originally gave the wrong name in Spanish for the Chile's Library of Congress. I have also now given more accurate credit to the full IT team there. More about the Library's groundbreaking legislative IT work can be found here, here and here.

If you see errors in my posts, please do let me know ASAP!!

Monday, September 17, 2012

Guest Post at VoxPopuLII

Grant Vergottini and I were given the opportunity to write about our exploits with legal hackathons and legislative editors over at the VoxPopuLII blog.  Do go over there and disagree with us in the comments.  We welcome a heated discussion.