Friday, September 20, 2013

Job Post: Create the Future of Law

Do you love law, but hate bluebooking? Do you think we should be able to apply version control to law, but know better than to just dump the U.S. Code into git?

Then you might be the person for us. We are looking for a few strong programmers with an interest in and (hopefully) understanding of law. Web stack (html/css/js) + XML + document parsing experience are all big plusses. As is ambition, integrity and an itch to change the world.

Who are we?

Xcential has built legislative drafting and repository systems for more than a decade. In the last couple of years, we have taken on a number of exciting projects, working on legislation and codification projects from Hong Kong and Chile to the U.S. House. What you see online about our work is just the tip of the iceberg.

Interested? Send a resume or a link to me at ari.hershowitz at xcential dot com. Or tweet me your interest (@arihersh) and a link to something interesting about you.

Monday, September 9, 2013

From 1789 to #OpenData2013

The First U.S. Law
I'm looking forward to the Data Transparency conference today in Washington, DC, organized by +Hudson Hollister. You already know this if you've been following this blog, or my tweets (@arihersh), or have had the good fortune to speak with me about legislative data recently.

I will be presenting on [participating in] a panel on legislative data, moderated by +The Cato Institute's Jim Harper ( says: if you like this blog, you might like Jim's Open Bills project).

The core of my presentation will be the proposal from my previous post for Congress to launch Operation Clean Desk. Here, I want to give some morsels of historical perspective on the codification process.

The first law enacted after adoption of the Constitution was "An Act to regulate the time and manner of administrating certain Oaths" in 1789. (Check wikipedia or the Statutes At Large, 1789-1799, page 23, if you don't believe me.) Parts of that Act still survive in Title 2, Sections § 21, § 22, and § 25, and Title 4, § 101 and § 102, of the United States Code.

If you take a look at section 25, for example, it is drawn from Chapter 1, section 2 of that first Act, and was amended by an act in 1948. If Congress wants to amend it again (say to allow the Oath of Office to be administered electronically), it would likely have to cross-reference that original act, and the 1948 amendment. That is true even though in practice the law is almost always referred to as 2 U.S.C. § 2. So in effect you would have 3 references in the amendment (1789, 1948 and U.S. Code). By contrast, if Congress wanted to amend Title 4, § 101, it could do so directly without reference to the earlier act. Why? Because Title 4 has been enacted by Congress. How do I know?  Look for the asterisk in this list of U.S. Code titles.

There have been many attempts at codification of U.S. law over the years, starting with the first official codification, the Revised Statutes of the United States (1876, wikipedia link). The U.S. Code structure was first established by Congress in 1926 and the Law Revision Counsel-- an official office to compile the code and prepare codification bills, was established in 1974 (2 USC 285). In nearly 100 years since the Code was first created, about half of the titles have been enacted.

There are currently 8 codification projects prepared or in preparation for Congressional enactment. Technology, and political will, can accelerate this historical project. That would be a tremendous legacy for Congress and a great step forward for government transparency.

Friday, September 6, 2013

Data Transparency 2013: Operation Clean Desk

Mayor John Dore, Seattle Municipal Archives
I've been meaning to organize these papers
I have been invited to speak [join a breakout panel on legislation] at the Data Transparency conference (2013) in Washington, D.C. next week. My message will be quite simple so that everyone can then focus on the other, very impressive, speakers: Now is the time for Congress to launch Operation Clean Desk.

Cleaning up

As I have discussed before, the process of legal drafting has built up a lot of cruft over the years. Amendments on top of amendments on top of original Acts. Even politicians who want to reduce the size of government end up doing so by adding to this growing body of overlapping law.

Congress saw the fundamental solution to this problem years ago, and created the Office of the Law Revision Counsel, which prepares the U.S. Code (see 2 U.S.C. 285). This is a single, clean, compilation of most of U.S. Federal public law. But until Congress formally enacts the Titles of the Code, these compilations live in parallel with the growing sea of historical laws. The LRC also compiles draft bills that would enact individual Code titles.  This process has been called positive law codification, though some (e.g. +Harlan Yu ) have proposed to refer more directly to "enactment of U.S. Code titles". Passing these codification bills would be step 1 of Operation Clean Desk.

Step 2 is related: make adjustments to Congressional drafting workflow, so that new laws fit more directly into the existing structure of the U.S. Code. This process is already underway: the Office of Legislative Counsel is well aware of the value of drafting consistency and the new House Rules have focused on consistency and transparency. For example, citations in a bill should make direct references to U.S. Code sections wherever possible. There are also terrific drafting guidelines (e.g. pdf from House Legislative Counsel, 1995) about the language, form and structure of amendments.

These drafting guidelines can be updated in practice to work even better with structured data formats (e.g. the U.S. Code in XML, announced by the Speaker's office earlier this summer). I don't mean to prescribe the details of drafting rules here; there are a lot of great minds at the Office of Legislative Counsel and other places in government, who work on this.  Some ideas are captured in +Grant Vergottini's blog post on variations of how to represent amendments, including the redlining model used in California. Other ideas (e.g. plain language drafting) can be brought to bear. What I recommend here, is that Congress should give priority to a further standardization of the drafting process itself.

Now is the time

I hope you will agree that these are good ideas, albeit not really new ideas. What is new is the timing. The speakers list at the Data Transparency conference suggests why. While no one might have objected before, now there are key leaders in Congress who genuinely care about the technical details. Operation Clean Desk will not get them a lot of press or a lot of invitations to the Sunday talk shows. But they know it needs to happen in order to facilitate a common agenda of transparency and government efficiency.

The low profile nature of the LRC's codification bills has often also made them low priority. But these bills are extremely important for the project of government transparency and they need a push. I believe that the high level participants in the Data Coalition conference can come together to give these bills the priority in Congress that they deserve-- despite the lack of lobbying or special interest pressure.

At the same time, Congressional offices can be drawn in to the project of drafting standardization. I envision a combination of drafting technologies and education of Congressional staff.

Jeffrey Beal's desk (CC license)
I know we have a law on that...just give me a moment
This two step process defines Operation Clean Desk for me. And there is no time like the first Data Transparency conference to begin this work.

I will be in good company at the conference: House Speaker John Boehner (R), Congressman Darrell Issa (R), Senator John Warner (D), and many others from both sides of the aisle. The conference keynote will be given by U.S. Deputy CTO Nick Sinai.

Monday, August 5, 2013

The Most Productive U.S. Congress Ever

It's my birthday today. Maybe that's what makes me optimistic, despite evidence of a catastrophic climate change singularity, politicians and pundits still fiddling with Wiener while the planet burns, an immigration debate where rebuilding the Berlin wall on our border takes precedence over rebuilding our economy. Maybe that's why I refuse to believe, along with the elite liberal media, that today's Congress will be the least productive in history.

Hammurabi's Code, Prologue
I'll make a prediction: today's Congress will be looked back on as the most productive. The one that set the foundations for how legislatures work in a digital age. The way I see it, political gridlock may allow this to happen. When the crew can't agree on a direction to turn the ship, it may be time to repair the ship itself. And this is what I mean:

For more than 200 3700 years, laws have been written on paper [and in stone], in an excruciatingly slow, inefficient process in stuffy rooms that admit little sunshine. Two major facts about the drafting process make things worse: cut and bite amendments (replacing text of a prior law without giving any context), and non-positive law[**].  +Grant Vergottini 's blogpost on the readability of laws deals with the first issue quite nicely. I have discussed the difference between positive and non-positive law here before. Most of the law in the United States today is non-positive, meaning that  understanding any one clause requires gathering many, sometimes dozens, of laws and amendments enacted across decades. Fixing this requires action from Congress to pass legal compilations that are prepared by the Law Revision Counsel. It should be a straightforward, non-partisan process. And despite doubters, I think this Congress can move this agenda in a way no other has done.

Last week, House leadership announced that the U.S. Code will be available in bulk, for the first time, in well-structured XML. (Disclosure: this was our baby) This is the first plank in a Republican platform of transparency. The next big one is the DATA Act, quietly making its way through Congress due to a stellar multifaceted coalition. And I think that, deep down, the Republicans championing these reforms know that Democrats (even President Obama, whose birthday was yesterday) agree with them on this transparency and technology agenda. When the legislation gets passed, and the public starts to realize what good has happened with little fanfare in Washington, the only thing left to do will be to fight over the credit.

Is this picture too rosy? Let me dream a bit-- it's my birthday.
** For the legal philosophy ninjas reading this (you know who you are), I am referring to positive law in the context of U.S. positive law codification. This is not the same as positive law in legal philosophy. "Huh?" you say. Don't ask me, ask the folks at the House Law Revision Counsel.  They explain it better than I ever could, here.

Thursday, June 27, 2013

UUID for legal texts: Part 451fe00e-c2fe-4c11-9f10-5f96395e2523

Creating a data-friendly reference for legal texts can be far from straightforward, as I pointed out in reference to the Supreme Court's decision this week to overturn DOMA section 3 (aka 1 USC 7).

As  Tim Arnold-Moore pointed out in response to my last post on unique identifiers, not all issues can be addressed in a single identifier, and not all applications need to address all issues. Tim, who has developed legislative data systems for Tasmania, Canada (French & English) and Singapore, among others, noted that "[t]he ID schemes we chose in all these jurisdictions solved the problems we were trying to solve." Indeed, it is a lot to ask for an id scheme to solve all problems in all contexts for legal documents.  But I believe it is important to identify the big categories of problems that will  need solving, and to develop common id schemes for these cases. In particular, the solution Tim describes for Singapore, which "used both structural and UUID schemes side by side"  and accounts for section validity, merits further amplification. And I hope we can explore it as an example in our book.

The following goals, in some combination, are required for effective referencing in a variety of legal contexts:
  • Identify: Accurately and uniquely identify the source text from the assigned id.
  • Find: The id should fit into a common lookup scheme to allow retrieval of the identified text (URL is the obvious example). Ideally, the text itself is itself just one link away from its surrounding context (e.g. a section of an Act, embedded in the Act itself).
  • Validate: Confirm that the text found is the one referred to by the id.
  • Create: Creation of the identifier should be straightforward, applying a set of unambiguous rules. In my ideal, these rules would be localized to the text itself and, if necessary, its immediate textual surroundings.
  • Update: In many circumstances, the id should distinguish between a legal object (e.g. Section 3) and its current instance (e.g. Section 3 as of 12pm on January 2, 2013). This information may include changes due to the legislature's "in force" or sunsetting provisions, repeal, amendment or, in the case of DOMA section 3, invalidation by a judicial authority.
No single identifier can deal with all of these requirements, but there should be a family of id specifications that can provide a buffet to choose from for a particular legal reference.

A round-up of other comments on legislative ids:

+Robert Richards referred to the LEX:URN standard (anyone know what the current status is, or a link to a "live" version?).  The standard uses a FRBR-like style, and requires a "Jurisdictional Registrar" to create uniform names for jurisdictions (e.g. 'eu', 'us', 'fr'). The elements within the reference are to be defined and standardized by the "national Authority". It is not clear to me how this will apply to non-national jurisdictions. Examples of the LEX:URN format (from the spec) include:
  • urn:lex:es:estado:ley:2002-07-12;123 (Spanish act)
  • urn:lex:ch;glarus:regiere:erlass:2007-10-15;963 (Glarus Swiss decree)
  • urn:lex:eu:commission:directive:2010-03-09;2010-19-EU (EU Directive)
  • urn:lex:us:federal.supreme.court:decision:1963-03-18; (US FSC decision)
+Rinke Hoekstra pointed to the CEN MetaLex standard, used to represent UK and Dutch legislation. A sample reference,, provides linked data about the "Rome Statute of the International Criminal Court", including a link to a text source: (slow to load). According to Rinke, this id scheme includes a (SHA-1) hash of the document contents, as well as a versioning mechanism (apparently a date or datetime stamp). This approach has a lot to recommend it, including the potential to connect a reference to a body of metadata, which can address other goals outlined above.

+Sean McGrath  referred to the PRESTO (Public REST Object oriented) architecture (pdf at O'Reilly), and I would be interested to know how this relates to the proposed LEX:URN standard or other existing standards.

And Franklin Siler (@franksiler) mentioned the difficulty of applying an id scheme to unpublished court opinions.

So no single solution, but a number of considerations and some existing standards to help define id(s) for legal texts.

Wednesday, June 26, 2013

DOMA Section 3: How to Cite it Now?

The "Defense of Marriage Act" (DOMA) Section 3 has been struck down. That may not be news to you by now.  If you ask me, striking it down was the easy part. Much harder is defending the Act on the grounds that the Supreme Court should show deference to the wisdom of Congress, in the same week that you vote to strike down the core of the Voting Rights Act; the dictionary entry for "chutzpah" just got a new entry. (For more on this, see Lawrence Tribe's analysis.)

Somewhere in between, on the hardness scale, is figuring out how to cite DOMA section 3 now.  Wikipedia admirably shows the full, correct legal citation [for DOMA], with links:  Pub.L. 104–199, 110 Stat. 2419, enacted September 21, 1996, 1 U.S.C. § 7and 28 U.S.C. § 1738C . This shows how many ways there were to cite the law tricky the legal citation problem was before today's Court opinion. But now that section has been invalidated by the Supreme Court. That doesn't take it out of the U.S. Code or affect its legislative history. So where to put the information that it is no longer valid under U.S. law? Lawyers will use the time worn tradition of parentheticals, like: 1 U.S.C.  7 (nixed by the Supreme Court) or 28 U.S.C. 1738C (squashed like a bug, c.f. United States v. Windsor).

But these parentheticals are not standardized, and are not logically part of the citation unit.  More concretely, in assigning a UUID to DOMA section 3, how should the court's opinion be incorporated? Assuming an XML model, is this a separate attribute on the reference element (e.g. validity="invalid")? Should there be a flag in the id itself? (e.g. href="DF3Ae8362-invalid") And should invalidation by the Court be distinguished, in the data, from repeal of the section by Congress? As was pointed out to me, this information may be added, in the future, as a Constitutionality note such as 19 U.S.C. 535 note.

Your thoughts are welcome. I plan to incorporate them, and excellent feedback (by Tim Arnold-Moore, +Robert Richards+Sean McGrath and others) that I've gotten on my previous post on UUID's into an follow-up post on UUIDs for legal texts.

A note on the legislative history of DOMA section 3, that points to the more general need for a *unique* identifier for legal documents: For starters, DOMA was 104 H.R. 3396 (pdf), and passed as Public Law 104-99 in 1996. It Section 2 amended "Chapter 115 of title 28, United States Code ... by adding after section 1738B" a new section, 28 U.S.C. 1738C,. That section, itself, includes the specific instruction to amend while Section 3 amends Chapter 1 of Title 1 of the U.S. Code (which itself was passed as the Dictionary Act) by adding a new section 7.
Note: Following comments I received from a U.S. Code expert, this post has been corrected to reflect the correct structure of the Act and its effect on the U.S. Code.

Tuesday, June 11, 2013

UUID for Legal Text

There has been a lot of interest and I have gotten great feedback on the post about the book I'm writing with Grant about legislative data.

Data standards are always a hot topic (relatively hot-- we normalize against interest in this field in general, not against interest in the Kardashians:

Among the questions on data standards that have sparked interest is the question of how to assign unique identifiers to legal text. These are needed for many reasons, in a variety of contexts. The most straightforward is to be able to hyperlink to a specific subsection of a bill or law.

Some options for creating the unique identifier include:

  • A unique randomish code (e.g. based on the current  datetime)
  • A hash of the text of the section
  • A URN or URL identifier based on a standard, human-readable path to the section (e.g. us/uscode/title26/section100)
  • Some combination of the above
Version control is a very important consideration: Section 100 of title 26 may be amended and the identifier should tell us which version we're citing.  Some very technically savvy minds at the Law Revision Counsel of the U.S. House of Representatives, have suggested a combined approach with one identifier for the Code section, and one that specifies the version (e.g. the version as amended by P.L. 114-XYZ).

Another question is whether the id should itself carry information about the text. In the case of a hash, we could use a similarity-preserving hash, e.g. simhash, so that texts that are related would result in hashes that are close to each other. This might have advantages, for example, in citing to court documents. Text in one court opinion that is similar to text in another may provide useful precedent; a search algorithm could collect similar text sections based on these Simhashes.

Rather than get ahead of myself and draft out the entire chapter on unique identifiers, I'll stop here and invite your comments.
  • What is important to preserve in a unique identifier for legal texts?
  • What id schemes have proven successful in other document-based structures?
  • What would Google (or Linus Torvalds) do?

If you have Insights or connections to People With Insights-- please comment here or let me or Grant know.

Tuesday, June 4, 2013

First Commit: Legislative Data, the Book

I'm writing a book with +Grant. This may be a surprise to him. We've discussed the book, we're planning on it, we've even begun to flesh out many of the ideas in our blogs. But we hadn't said anything publicly about it until now. Grant's in Hong Kong this week for work, so I figured it's a perfect time for me to commit us publicly to this project and deal with the consequences when he's back.

By the time he's in California again, I'm hoping that expectations have grown such that we just have to bite the bullet and write. I am anticipating a reading audience of dozens, but hope for an impact on millions. And that is where I'm counting on you.  In typical esoteric policy tech fashion, I've created a +GitHub repository with our first commit.  And a wiki with my very first draft of a table of contents:

We'll cover legislative data standards (e.g. Akoma Ntoso, SLIM), data format wars (html, xml, json, rdf), policy (e.g. DATA Act) and drafting decisions, positive law codification, open government and transparency, tools of the trade and more. Take a look and see what I've missed or what I've messed up.

Because it's on Github, you can make a branch, make suggestions or even a pull request. Suggest a new chapter, suggest a better title or subtitle for an existing chapter. Write a first draft or prepare to comment on our drafts (which may or may not be committed first on Github before publishing-- a lot may change after Grant reads this post). Or leave your comments here. And if you make extensive comments or edits, maybe that means that you should go ahead and write your own d#&!n book. Or join us as a co-author.

Thursday, May 30, 2013

U.S. House Legislative Data and Transparency Conference 2013

Two obvious questions:

1. What have I been up to while neglecting this blog for so long?
2. What was so important that I'm back to blog again?

I'll give away the second answer first: the U.S. House of Representatives (yes, that U.S. House) held its second annual Legislative Data and Transparency Conference. If you want to hear who was there and what happened, read the latest blogpost by +Grant Vergottini. If you've read this far, you should subscribe to his blog anyway and get a heads up on all of his posts. Go ahead, follow the link. I'll wait for you here, while I formulate a fascinating and hopeful political point about the conference and its participants for the end of this post. No peeking, do check out Grant's post first.

Welcome back.

The answer to the first question is related to the conference you just read about: I have been been buried deep in projects to redefine how legislative data is produced and consumed, joining +Grant+Bradlee Chang +Patrick Andries and +François YERGEAU at a small (soon to be not-so-small) company called Xcential. My main focus has been to work with +Eridan Otto, Francisco CifuentesDavid Vilches and others in the amazing IT team led by +Christian Sifaqui at the Chilean Library of Congress (la Biblioteca del Congreso Nacional de Chile, for those of you who are Googling en español) to build the world's most advanced web based editor for legislation. Immodest, but true.  

We're building the editor based on Xcential's LegisPro Web, which grew out of a couple of hackathons that I organized with Grant, +Charles Belle and others. The original idea was to build a tool that would make it easy for lawyers and people who care about policy to add semantic data to legislation. That's how Jim Harper at +The Cato Institute (yes, the how-i-learned-to-stop-worrying-and-love-climate-change Cato Institute) is using the editor in his ambitious DeepBills project, which you read about in Grant's post.

Chile's Library of Congress has built an impressive linked data ontology to identify hundreds of thousands of concepts (from President Eduardo Frei to government positions, such as the Minister of Public Works). Using the editor, Chile has a team of people working full time to add this rich semantic data back into published laws and Congressional records. This opens up powerful avenues for research. Like finding out which senator introduced the most bills, or supported the most legislation dealing with his home province, or had the most floor time in the senate. This goes far beyond mere computerization of legislative text to creating a rich semantic web of legislative data. As was envisioned in Grant's 2011 post.

And that brings me to my fascinating and hopeful point about the Legislative Data and Transparency Conference.  There were over a hundred people there, actively engaged, from the extremes to the middle of the political spectrum.  And they agreed.  Not on whether to publish APIs or bulk data--that's the hard stuff for this crowd (c.f. +Joshua Tauberer's  Data Maturity Model). But on the fundamental principal that our laws should be readable, shareable, searchable. It's a poor reflection on our society that we can easily do this with gossip about the Kardashians, but not so easily with the proposed Immigration law.

Even more impressive than this agreement in principle, this odd group of bedfellows are working together on concrete projects to make government, and particularly the legislative branch, more efficient and transparent:  +Hudson Hollister has brought together a powerful coalition in the Data Transparency Coalition to promote the DATA Act, which would require government spending data to be standardized and published. The U.S. House has launched a modernization project (pdf) which is bringing together the Office of Legislative Council, which drafts new laws for Congress, and the Law Revision Council, which compiles and codifies the laws. Already, this project has begun to develop modern data standards based on XML Schema that can be used for bills, laws and the U.S. Code. These standards are related to developing international standards for legislative data, like Akoma Ntoso, which is being used by the Chilean Congress.

If there is one takeaway from the recent conference for me, it is that legislative transparency now has a posse. It may not be the lobby for version control I still hope for, and it may not be as powerful as the NRA, ACLU or AARP. But this group of technicians, technocrats and politicians are making things happen-- so far without politics or polemics. And the change that this conference signals could transform how we learn about, access and participate in government. Other than that, Mrs. Lincoln, not much happened at the conference.

This post originally gave the wrong name in Spanish for the Chile's Library of Congress. I have also now given more accurate credit to the full IT team there. More about the Library's groundbreaking legislative IT work can be found here, here and here.

If you see errors in my posts, please do let me know ASAP!!