Thursday, June 27, 2013

UUID for legal texts: Part 451fe00e-c2fe-4c11-9f10-5f96395e2523

Creating a data-friendly reference for legal texts can be far from straightforward, as I pointed out in reference to the Supreme Court's decision this week to overturn DOMA section 3 (aka 1 USC 7).

As  Tim Arnold-Moore pointed out in response to my last post on unique identifiers, not all issues can be addressed in a single identifier, and not all applications need to address all issues. Tim, who has developed legislative data systems for Tasmania, Canada (French & English) and Singapore, among others, noted that "[t]he ID schemes we chose in all these jurisdictions solved the problems we were trying to solve." Indeed, it is a lot to ask for an id scheme to solve all problems in all contexts for legal documents.  But I believe it is important to identify the big categories of problems that will  need solving, and to develop common id schemes for these cases. In particular, the solution Tim describes for Singapore, which "used both structural and UUID schemes side by side"  and accounts for section validity, merits further amplification. And I hope we can explore it as an example in our book.

The following goals, in some combination, are required for effective referencing in a variety of legal contexts:
  • Identify: Accurately and uniquely identify the source text from the assigned id.
  • Find: The id should fit into a common lookup scheme to allow retrieval of the identified text (URL is the obvious example). Ideally, the text itself is itself just one link away from its surrounding context (e.g. a section of an Act, embedded in the Act itself).
  • Validate: Confirm that the text found is the one referred to by the id.
  • Create: Creation of the identifier should be straightforward, applying a set of unambiguous rules. In my ideal, these rules would be localized to the text itself and, if necessary, its immediate textual surroundings.
  • Update: In many circumstances, the id should distinguish between a legal object (e.g. Section 3) and its current instance (e.g. Section 3 as of 12pm on January 2, 2013). This information may include changes due to the legislature's "in force" or sunsetting provisions, repeal, amendment or, in the case of DOMA section 3, invalidation by a judicial authority.
No single identifier can deal with all of these requirements, but there should be a family of id specifications that can provide a buffet to choose from for a particular legal reference.

A round-up of other comments on legislative ids:

+Robert Richards referred to the LEX:URN standard (anyone know what the current status is, or a link to a "live" version?).  The standard uses a FRBR-like style, and requires a "Jurisdictional Registrar" to create uniform names for jurisdictions (e.g. 'eu', 'us', 'fr'). The elements within the reference are to be defined and standardized by the "national Authority". It is not clear to me how this will apply to non-national jurisdictions. Examples of the LEX:URN format (from the spec) include:
  • urn:lex:es:estado:ley:2002-07-12;123 (Spanish act)
  • urn:lex:ch;glarus:regiere:erlass:2007-10-15;963 (Glarus Swiss decree)
  • urn:lex:eu:commission:directive:2010-03-09;2010-19-EU (EU Directive)
  • urn:lex:us:federal.supreme.court:decision:1963-03-18; (US FSC decision)
+Rinke Hoekstra pointed to the CEN MetaLex standard, used to represent UK and Dutch legislation. A sample reference,, provides linked data about the "Rome Statute of the International Criminal Court", including a link to a text source: (slow to load). According to Rinke, this id scheme includes a (SHA-1) hash of the document contents, as well as a versioning mechanism (apparently a date or datetime stamp). This approach has a lot to recommend it, including the potential to connect a reference to a body of metadata, which can address other goals outlined above.

+Sean McGrath  referred to the PRESTO (Public REST Object oriented) architecture (pdf at O'Reilly), and I would be interested to know how this relates to the proposed LEX:URN standard or other existing standards.

And Franklin Siler (@franksiler) mentioned the difficulty of applying an id scheme to unpublished court opinions.

So no single solution, but a number of considerations and some existing standards to help define id(s) for legal texts.