Thursday, October 22, 2015

Git for Law Revisited

Laws change. Each time a new U.S. law is enacted, it enters a backdrop of approximately 22 million words of existing law. The new law may strike some text, add some text, and make other adjustments that trickle through the legal corpus. Seeing these changes in context would help lawmakers and the public better understand their impact.

To software engineers, this problem sounds like a perfect application for automated change management. Input an amendment, output tracked changes (see sample below). In the ideal system such changes could be seen as soon as the law is enacted -- or even while a bill is being debated. We are now much closer to this ideal.

Changes to 16 U.S.C. 3835 by law 113-79

On Quora, on this blog, and elsewhere, I've discussed some of the challenges to using git, an automated change management system, to track laws. The biggest technical challenge has been that most laws, and most amendments to those laws, have not been structured in a computer friendly way. But that is changing.

The Law Revision Counsel (LRC) compiles the U.S. Code, through careful analysis of new laws, identifying the parts of existing law that will be changed (in a process called Classification), and making those changes by hand. The drafting and revision process takes great skill and legal expertise.

So, for example, the LRC makes changes to the current U.S. Code, following the language of a law such as this one:
Sample provision, 113-79 section 2006(a)
LRC attorneys identify the affected provisions of the U.S. Code and then carry out each of these instructions (strike "The Secretary", insert "During fiscal year"..."). Since 2011, the LRC is using and publishing the final result of this analysis in XML format. One of the consequences of this format change is that it becomes feasible to automatically match the "before" to the "after" text, and produce a redlined version as seen above, showing the changes in context.

To produce this redlined version, I ran xml_diff, an open-source program written by Joshua Tauberer of govtrack.us, who also works with my company, Xcential, on modernization projects for the U.S. House. The results can be remarkably accurate. As a pre-requisite, it is necessary to have a "before" and "after" version in XML format and a small enough stretch of text to make the comparison manageable.

Automating this analysis is in its infancy, and won't (yet) work for every law. However, the progress that has been made points the way toward a future when such redlining can be shown in real-time for laws around the world.

Wednesday, September 16, 2015

More Elasticsearch: Flexibility without duplicates

People want everything. When they're searching, they want flexibility and they want precision, too. Legal researchers, especially, show this cognitive dissonance: in their personal lives they are used to Google's flexibility ("show me that hairy dog that looks like a mop"), and at work they use 'Advanced' search interfaces that can find the right legal document, if only they write a search query that is sufficiently complex ("show me the rule between September 1981-1983 that has the words 'excessive' and 'sanctions' within 4 words of each other, and does not have the word 'contraband'").

To search through legal documents, precision is important: 42 U.S.C 2000e-5 (a section of the United States Code) is not the same as 42 U.S.C. 2000e. At the same time, a text search for 'discriminate', should probably also return results that have the word 'discrimination'. To handle this in Elasticsearch (ES) seemed at first simple: create two indexes, or two 'types' within a single index. In essence, we'd index the documents once with a permissive analyzer that doesn't discriminate between 'discriminate' and 'discrimination' (an English-language analyzer) and once with a strict analyzer, that breaks words on whitespace and will only match exact terms (read more on ES analyzers here). Search the first index when you want a flexible match and the second one when you want an exact match. So far so good.

None or too many


But what about combining a flexible match with a strict one ("section 2000e-5" AND discriminate)? You either get no results or duplicates. No results are returned if you're looking for the overlap of the two terms: by design, the two indexes were created separately.  OTOH, if you're looking for matches of either term, you get duplicates, one from each index. Back to the drawing board.

To remove duplicates, the internet suggests field collapsing: index each document, using the same ID value in both indexes, group by ID and set 'top_hits' to 1, to get just one of the two duplicates. Unfortunately, grouping also breaks the nice results pagination that comes with ES. So you can de-duplicate results, but can't easily paginate them. This is a problem for searches that return hundreds, or thousands of results. For a nice afternoon detour, you can read why pagination and aggregation don't play well together.

Two fields in one index

O.K., then, how about indexing each field twice within the same document in the index. The two copies should have different names and should be analyzed differently. For example, one could be called 'flex_docText' and the other 'exact_docText'. Combined flexible and exact searches will point to the same document. And while each field is indexed and searched differently, the original text that ES stores will be the same, so we only need to return one of these fields (it doesn't matter which) to the user.

How-to

The first step is to create the new index with a 'mapping' for the two fields that defines the different analyzers to use for each: 
POST myindex
{"mappings":{
    "mytype" : {
          "properties" : {
                "flex_docText" : { "type": "string",
          "analyzer" : "english" },
                "exact_docText" : { "type": "string",
          "analyzer" : "whitespace" }
          }
    }
  }
};
https://gist.github.com/aih/79155bd4835d3781b380

Next, index the documents, making sure to index the 'docText' field twice, once under each name. This can be as easy as including the content twice when creating the document:

PUT /myindex/mytype
{
  "flex_docText": "This is the text to be indexed.",
  "exact_docText":  "This is the text to be indexed."
}

Indexing from SQL

An additional complication arises when importing data from a SQL database. As described in my earlier post, a nice open source JDBC import tool was built for this purpose. So nice, in fact, that it directly takes the output of a SQL query and sends it to Elasticsearch to be indexed. The downside is that the data is indexed with just the name it has in the SQL query.  So, if your database column is named 'docText', in a table named 'myTable', you might use this query:

SELECT docText FROM myTable

The JDBC import tool would then index one field, called docText. If you want to create two parallel fields in the index, it is necessary to rename the database column, and extract it twice from the database, using the following syntax:

SELECT docText as flex_docText, docText as exact_docText FROM myTable

In fact, you can extract the same data as many times as you want, under different names, and apply different analysis to the data in the index mapping.  Does that really work? Yes, that really works.  Now if you want to highlight search results and avoid duplicates, that's a story for another day.

Tuesday, September 15, 2015

Elasticsearch on Windows: Security

A government client wants to be able to search through their data. In a typical case, an office or agency has a long-standing rulebook that employees still carry around in a paper binder, because they lack a browsable, searchable electronic version. For example, the GSA's consulting group, 18F recently described how they helped the Department of Labor envision and design an electronic version of the Field Operations Handbook. Their prototype included a search layer built on Elasticsearch and served withApache (presumably on a Linux server).

In almost any nook and cranny of government, you will find similar handbooks, guidebooks, rulebooks and other books that are crying out for user-friendly search layer. Elasticsearch is an excellent choice for such a project, but it often needs to be integrated with the government's (usually Windows-based) servers. Kudos to 18F for introducing Linux in many of their projects, but for now, those projects are the exception. For the rest of us, using Elasticsearch on Windows presents a number of challenges. All of these can be overcome, though documentation for these solutions is scarce.

In my last post, I described how data from a MS SQL Server can be imported to and indexed in Elasticsearch. In future posts, I will discuss some indexing and analysis tricks that can make the search experience smoother (like how to avoid retrieving duplicate results). This post is dedicated to security of the Elasticsearch cluster on Windows.

Elasticsearch lacks built-in security

Out of the box, Elasticsearch:
  1.  Uses a separate port (default = :9200) for requests and responses. Searching directly from the browser would require opening the port to web traffic on the server.
  2. Allows data to be changed and deleted, in addition to being retrieved. This is great for a developer who can quickly create, delete and update a search index. You don't usually want your users to have these superpowers, though.
  3. Provides data about the entire search cluster and all indexes on the cluster. That's like having a SQL server open to the world. While this may please Bobby Tables, it makes the rest of us uncomfortable.
Elasticsearch is not built with integrated security features. You can use a separate plugin, called 'Shield', for security, though that introduces another dependency and may be overkill for many use cases. It is also possible to route all traffic to Elasticsearch through your server-side web application, but this requires a lot of duplicate logic to accept and return requests.

Another option is to use features of the IIS webserver itself. For this, you will have to build a search application interface, but for any practical search project you will want to do that anyway. I have stripped down one such GUI to focus on text-base document searches. This starter application (on Github) takes user input, formulates searches and returns a paginated list of results. Do let me know if you use it: I have discovered many useful improvements to satisfy specific client needs.

Using a Reverse Proxy on IIS

With IIS serving your main application, you can create a reverse proxy on IIS for Elasticsearch requests. The reverse proxy will translate and reroute url web requests into internal requests to Elasticsearch. How does this work? Your application requests to http://www.mydomain.com/search are routed by IIS internally to http:localhost:9200/_search. The internal address is not accessible to outside web traffic.

Advantages:
  1. No need to expose another port. All traffic can be routed through a url on your default web port (:80).
  2. Block delete and change requests-- simply don't set up any urls that will route these requests to Elasticsearch.
  3. Use IIS security features as needed for your main application (e.g. Anonymous login or Windows login).
On IIS, setting up a reverse proxy is not as tough as it sounds. It requires two modules that can be installed using the Web Platform Installer: URLRewrite and ARR. Thorough instructions for setting up the reverse proxy can be found here. The essentials (and ES-specific details) follow:

Install Web Platform Installer

Pretty straightforward, from a Microsoft downloads page.

Install ARR and URLRewrite modules

Open the Web Platform Installer interface, search for "ARR" and "URLRewrite" respectively, and follow instructions to install each of them.

Create the URLRewrite rules for your website

  1. Open the IIS Server Manager and navigate to your website in the Connections column on the left side. Double click on the Application Request Routing Cache icon (ARR, under IIS, in the Features View). 
  2. Open Server Proxy Settings on the right side of ARR and make sure the checkbox is selected for Enable Proxy. Close the ARR menu. Double click on the  URLRewrite icon. Click Add Rule -> Blank Rule.
  3. Write the rewrite rules for Elasticsearch. The main rewrite rule will  use Regular Expressions to match the a pattern like: search/_search(.*), and rewrite it as http://localhost:9200/_search{R:1}
  4. You may want to expose other Elasticsearch API's as well, and it is best to create a rewrite URL for each of them. For example, to check the health of the cluster, match search/_cluster/health to http://localhost:9200/_cluster/health. If you are having trouble writing using the IIS Manager UI to enter these rules, consult the  blog I referred to earlier, or directly add the rules to your web.config file.
  5. When you are done entering the rewrite rules, you will have an XML file in your website folder called web.config that will include a <rewrite> section.  It should look something like this file: https://gist.github.com/aih/8f2b8d76b44d8836bd77

Test it out

From your application, you should now be able to submit a query (either as query parameters in the url, or with a json query payload) to http://[www.yoursite.com]/search, and get the response from Elasticsearch. Note that now the path at /search is reserved for Elasticsearch traffic, so your application cannot use that path (e.g. for a webpage).  If this is a problem, you can use any path you prefer for your reverse proxy settings.

Your thoughts?

The Elastic team described setting up a reverse proxy with Nginx for many of the same reasons, and this does seem like a clean way to expose only the search API to external web traffic. Are there other architectures you have used for Elasticsearch on Windows? Do you see security vulnerabilities with the approach I've descriped? I'd like to hear your thoughts in the comments.

Monday, August 3, 2015

Elasticsearch and MS SQL Server: Would you like search with that?

In our data-rich world, everything goes better with search. We've gotten used to having search as a baked-in feature of applications and expect the search to be full-featured and flexible. So when building a data-rich enterprise web application, the question is often not whether to include search, but how.

If the data is stored on MS SQL Server, which is usually the case with the government customers we work with at Xcential, one option is the full-text search for MS SQL Server. However, full-text search is a relatively recent addition to SQL Server. Its feature set is limited and in my opinion its query language, an extension of SQL, gets over-complicated quite quickly.

There are also a number of open source search engines (Elasticsearch  and Solr, based on Apache Lucene, Sphinx) which are fast, robust and used in many of the search functions we are used to on websites such as Twitter, Facebook and Craigslist. The widespread use of these search engines also means that people are familiar with their features and they meet users' general expectations of how search should work. As with any open source project primarily developed in Linux environments, it is important to ask how well do these projects play with Windows?

In particular, how can the data from SQL Server be indexed and queried? My recent experience with Elasticsearch shows that, with some caveats, this can now be done rather painlessly.

Import SQL Server Data to Elasticsearch

If you search Google for 'SQL Server and Elasticsearch', there are many outdated references for how to create an index. Elasticsearch has deprecated the previous tools for importing SQL data, from what they used to call 'rivers'. But do not despair. Jörg Prante has built an excellent open-source JDBC importer to support a variety of SQL databases, including SQL server. The instructions for using the tool are clear and straightforward, but as Prante writes, Windows user have had lots of problems using it. There are some tricks.

The general instructions for using the jdbc importer are here. But it doesn't quite work. Steps 1 and 2 are to download and install elasticsearch. That actually goes quite smoothly. You will need Java (>1.8 update 20 or >1.7 update 55) installed on your system, but if I recall, the elasticsearch.bat script will walk you through installing that, if you don't have it already. Once elasticsearch is installed, I would recommend also installing the Marvel plugin (free developer trial) and starting the Sense interface. This will make it much easier to monitor the indexes you create and to test queries in elasticsearch.

Step 3 is to install the jdbc importer, using the command line wget application. While you could install wget by itself, you'll need the bash terminal from Cygwin for a later step, so might as well install Cygwin now. From the new Cygwin bash terminal, you can use wget to install the jdbc importer.

wget http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/<version>/elasticsearch-jdbc-<version>-dist.zip

Also in cygwin, move the zip file into your \Program Files (x86) directory and unzip using the unzip utility from Cygwin. (This may require administrative privileges.)

In steps 4 and 5, you download the jdbc driver from Microsoft and put it into the /lib folder of the JDBC driver package.  

At step 6 (set up database, make sure to allow TCP/IP connections), there are a number of nuances:
1. Enable TCP/IP in the SQL SERVER CONFIGURATION MANAGER ( (Start Menu > Microsoft SQL Server > Configuration Tools > SQL Server Configuration Manager > SQL Server Network Configuration)
2. Start the SQL SERVER BROWSER Service. To do this, open the CONFIGURATION MANAGER as an administrator (right click to open). Once in the Manager, right click the BROWSER SERVICE, click Properties and change the Start Mode to Enabled. (http://stackoverflow.com/a/21378235)
3. Explicitly set the TCP to 1433, as explained in these [two](http://stackoverflow.com/a/18850073) [answers](http://stackoverflow.com/a/24299346).
Also, change the database server authentication to 'Mixed Mode'. Give permissions on the database to a user and set the password for that user. 

Step 7 is to start the elasticsearch engine. Easy enough.

Step 8 has a sample script to import data into elasticsearch.

For windows, the variables in path names need to be in quotes and do not have braces. So I changed these lines in the script:
-cp "${lib}/*" \
-Dlog4j.configurationFile=${bin}/log4j2.xml \
to
-cp "$lib/*" \
-Dlog4j.configurationFile="$bin"/log4j2.xml \ 
In order to run the script, I found it most convenient to copy and paste the text directly into the Cygwin terminal. Note, that to paste text into a Cygwin terminal, you can right click and chose 'paste', or type 'shift-Insert'.

I copied and pasted (shift-Insert) the script into the bash terminal.

I have updated the instructions for using the jdbc importer with MS SQL Server in this gist. Next, I'll discuss how to build a general purpose user interface for the Elasticsearch engine.

Tuesday, March 31, 2015

A Cite to Behold: Standardizing Electronic Legal Citations

Citations are the hyperlinks of law, so it is natural to expect textual citations to be hyperlinked in electronic laws. Numerous parallel efforts are taking on this task for federal government documents online, making those documents easier to read and navigate. Derek Willis (@derekwillis) of the New York Times noticed these links in House Bills published on Congress.gov:


Linked citations are also now found in web interfaces for federal regulations. The terrific site, FederalRegister.gov, parses and adds citations to the user interface. For example, a recent proposed FCC rule on federalregister.gov has links to other regulations, Public Laws and to the U.S. Code:

However, these links are not found in the source XML for the document:


That means that when a regulation is finalized and goes into the Code of Federal Regulations (CFR), it will not have the citation links. To remedy that, the eRegulations project from the Consumer Financial Protection Bureau (CFPB) has its own code that parses citations and adds links for CFPB's user-friendly regulations interface. This code seems to work independently from and in parallel to federalregister.gov, doing essentially the same thing at a later stage of the publication process for regulations.

While hyperlinks are being added to citations in various contexts, each implementation uses different methods to parse the citations and different formats for the links. This leads to inconsistent coverage for textual citations and inconsistent formats for identical textual citations. Before long, the lack of consistency will also lead to broken links. Now is therefore a good time to bring some uniformity to these efforts toward a standardized form for government electronic citations.

Here, I'd like to explore what that standardized form should look like. To serve its purpose, a citation (electronic or text) should be uniqueclear and permanent. A good citation leads to only one document, unambiguously identifies the portion of the document that is referenced, and will still be valid in the future.

Not all written citations meet these thresholds. In the United States Code, for example, there are two sections designated 5 U.S.C. 3598 (see the footnote for the section). That's right, Congress sometimes passes laws with duplicate section numbers. (True also for 10 USC 127c18 USC 3928 USC 193246 USC 70119, which have been redesignated during codification). There are also written citations that are ambiguous because they are missing context (e.g. 'Section 101 of such Act') or refer to law that itself has since been changed. Nonetheless, to the degree that the written citation is unique and unambiguous, the underlying hyperlink should be as well.

Two years ago, Grant Vergottini discussed progress on standardized electronic citations. Many of the ideas in that post were included in the electronic citation solution underlying the U.S. Code XML data model, called USLM. (Xcential helped develop USLM as part of the House Modernization Project). That two-part solution has proven to be efficient and extensible.

The first part involves an identifier: a standard, permanent path for the citation (e.g. 26 U.S.C. 501(c)) -> '/uslm/us/usc/t26/s501/c'). This identifier is clear and unique. It is condensed and human-readable.

In USLM each subunit of text is marked with a identifier of this form, making it easy to create a link to that subunit. So the subsection corresponding to 26 U.S.C. 501(c) has @identifier = '/uslm/us/usc/t26/s501'

The XML of 26 U.S.C. 501(c) with @identifier = /uslm/us/usc/t26/501/c
Conversely, any text that cites that subsection has @ref = '/uslm/us/usc/t26/501/c'.

Text citing 26 U.S.C. 501(c) with @href = /uslm/us/usc/t26/501/c

These identifiers allow fine-grained resolution for links to the U.S. Code from within the Code or from any outside source. They can also be extended to disambiguate between two provisions with identical section numbers, by concatenating additional metadata, such as the law that originally introduced the section (e.g. 'uslm/us/usc/t5/s3598/[/uslm/us/pl/100/32/s5] '). The U.S. Code also includes outbound references to other documents with identifiers of this type:


The second part of the solution is an online service called a 'resolver', which converts an identifier into a URL. The resolver keeps track of the most current and most official sources for documents and ensures that the embedded identifier can remain constant while the sources for these documents--various websites and APIs--may be in flux. Recently, for example, the best source for Congressional bills shifted from thomas.gov to congress.gov, the Library of Congress's new site. As long as the references are based on a standardized identifier, and not the current URL for the source, the resolver can redirect the link in the future to the appropriate future source.  This takes care of the permanence of electronic citations, allowing government websites to update and improve, without affecting the embedded electronic references in legal documents.

This approach has been informed by, and is intended to be compatible with the proposals being developed in the OASIS legal citations working group.

While the core of this solution is now built into the U.S. Code, more needs to be done in order to make it workable more generally. First, identifiers should start to be adopted in other document sources. As more government documents are published in machine-readable form, prompted by good governance or by legislation such as the DATA Act, these documents should have built-in identifiers at the paragraph level, or other appropriate subunits.

Second, a government-wide resolver (which could be hosted on a domain like 'citations.gov') could be used to translate identifiers into current URL's, and add new document sources (e.g. agency reports) as these become part of the electronic government ecosystem. That way, projects such as CFPB could simply add the resolver domain to the correct identifier (e.g. https://citations.gov/uslm/us/usc/501/c) convert a permanent identifier into a link to the source text. An extension of the resolver can take a textual citation as input, parse the text and return the correct identifier (e.g. 'section 501(c) of the United States Code' -> '/uslm/us/usc/t26/501/c').

With the great energy and infusion of talent into the US Digital Service, combined with the great work that is being done at the Library of Congress, GPO and as part of the House Modernization Project, we can now start to build electronic citations into federal documents that will stand the test of time.

Monday, March 16, 2015

Almost Digital Congress: ToDo List

When fitness hero Jack Lalanne, then in his nineties, was asked about his sex life, he said that he and his wife had sex almost every night. "Almost Monday, almost Tuesday, almost Wednesday...". Similarly, if you ask me about progress in modernizing the legislative process, I would say that Congress is almost digital. Not bad, considering the age of the institution, but still leaves something to be desired.

The confluence of last week's panel at SXSW, organized by the Congressional Data Coalition, and this week's #Hack4Congress show we've come a long way in the last couple of years [1]. In this, and the next few posts, I'll discuss what needs to be done to eliminate the "almost" and reap the full benefits of a digital Congress.

My focus is on the key documents that Congress produces: bills, amendments and ultimately, laws. Whether your guide is the Library of Congress or Schoolhouse Rock, the process should be familiar. An introduced bill goes through a number of changes before it becomes law. This is a change-tracking problem and at each stage we need to know (1) the original document, (2) the proposed change and (3) how that proposed change will affect the original.

I was recently asked, for example, if it is possible to show an automated comparison of the impacts on the law of two patent bills now before Congress: Coons's STRONG Patent Act (S. 632) and Goodlatte's Innovation Act, H.R. 9 (114th). The short answer is no. As a pre-requisite, each bill needs to be available in machine-readable form. H.R. 9 is available in XML, but Coons's Act is currently available only in pdf, not text or XML.

As a second pre-requisite, we would need a tool to apply the language of the bill directly to the law which it amends, in this case Title 35 of the U.S. Code. Take the example of section 4 of Goodlatte's bill. It currently reads:

(a) Amendments.—Section 290 of title 35, United States Code, is amended—
(1) in the heading, by striking “suits” and inserting “suits; disclosure of interests”;
(2) by striking “The clerks” and inserting “(a) Notice Of Patent Suits.—The clerks”...

These changes would be displayed in the context of the target law (section 290 of Title 35) in redlining, e.g.:
§ 290 Notice of patent suits suits; disclosure of interests
The clerks (a) Notice Of Patent Suits.—The clerks of the courts of the United States, within one month after the filing of an action under this title shall give notice thereof...
I've mocked up these changes by hand above, but if this process were automated, it would be possible to show the impacts of this patent bill side-by-side with other proposed patent bills. The 1929 "Ramseyer Rule" requires this kind of redlining when a bill is reported out of committee, but that is often late in the process. Ideally, the textual impact of a bill could be seen as soon as it is introduced.

In California, my company, Xcential, has worked with the legislature to automate this kind of tracking in its "As Amends the Law" feature. For any bill before the Assembly, the public can see the additions and deletions that the bill would effect in the context of current law. You can see how this works with the bill AB 26 Medical cannabis. You can see the bill's impacts on the current law, as well as the changes between different versions of the bill as it is amended.

The other major change-tracking challenge is the application of amendments to bills. This is the aim of the Amendment Impact Program, which Xcential is working on as part of the U.S. House's Modernization Project. Progress on this project was reported by the Committee on House Administration and in House Legislative Counsel Sandra Strokoff's presentation at the 2014 Legislative Data Transparency Conference. [While I work on these projects for Xcential, the information in this blog all comes from public sources and my opinions here are my own.]

In total, to follow a bill from introduction to law, Congress needs to:
  1. Ensure bill transparency: 
  2.  Bill text in consistent machine-readable formats. Many bills, particularly in the Senate, are still only available in pdf until later in the process.
  3. Automate bill-to-law change tracking:
  4. Apply bills directly to the U.S. Code and to non-positive law statutes.
  5. Pass codification bills:
  6. Make all titles of the U.S. Code into "positive law". This is primarily a political challenge, which I have discussed previously.
  7. Ensure amendment transparency:
  8. Amendment text in consistent machine-readable formats.
  9. Automate amendment-to-bill change tracking:
  10. Now being tackled in the Amendment Impact Program, discussed above.
In future posts, I will discuss these goals in more detail: the benefits that each of them brings, the progress that has been made toward achieving them, and some of the challenges that remain.

[1] Almost two years ago, I declared that legislative data has a posse and since then, that posse has been getting results. Congress now publishes machine-readable forms of most bills and amendments. The U.S. Code is also digitized, and updates are made available on the Law Revision Counsel's website shortly after new bills are passed. These and other advances were highlighted in testimony submitted (pdf) by the Congressional Data Coalition to House appropriators.

The House Majority is making digital transparency as a priority, adding this section to the House Rules for the 114th Congress:
The Committee on House Administration, the Clerk, and other officers and officials of the House shall continue efforts to broaden the availability of legislative documents in machine readable formats in the One Hundred Fourteenth Congress in furtherance of the institutional priority of providing public availability and use of legislative information produced by the House and its committees.
Fortunately, more and more citizens are interested and educated about what it will take to fulfill this commitment.

Friday, February 20, 2015

Code Across 2015 -- San Francisco -- And Hiring

I'm looking forward to join more than a hundred other programmers and civic activists at the Code Across 2015 event in SF, part of Code Across 2015, an event collaboratively organized by Code for America as part of International Open Data Day.

The scope of Open Data Day can be seen from this very long list of other Open Data Day hackathons and events (Google Document, not sure where it came from).

And if this event is your kind of thing and you want to make it your day job, get in touch with me. At Xcential, we're always looking for civic-minded programmers who are interested in working on browser-based applications for writing, amending and publishing law. We work with the state of California, Hong Kong, the U.S. Congress and others to bring law into the digital age.

Happy Hacking!

Wednesday, February 4, 2015

UK Sets High Bar for Digital Democracy: Steal this Report, Please

Last week, the Digital Democracy Commission of the UK House of Commons released its report. Just having such a Commission says something (John Boehner, are you taking notes?) According to their Speaker of the House, this is meant as the "start of a roadmap for improving and opening up the workings of the House of Commons." With simple, but compelling goals. Like:
  • By 2020, the House of Commons should ensure that everyone can understand what it does
  • By 2020, Parliament should be fully interactive and digital
And more. The report, their goals and even their presentation, richly hyperlinked and with embedded videos, is a terrific model to follow. Or even better just to steal wholesale. And they make it easy. Substitute the "U.S. House" or "Congress" for most of their goals and we have the results of a U.S. Digital Democracy Commission, all nicely packaged. Other countries could do the same.

I envision a cross-Atlantic Commission or workshop on digital democracy (John Boehner, are you still with me?). See, here's the thing. Ideas are catching. If the UK succeeds at Digital Democracy, it makes it easier for us. And the standards that are being developed can make the spread of these ideas even faster (e.g. the AKN legislative data standard, a 'lingua franca' of structured legislation). We can also share technology. The U.S. government has come a long way in the last few years with open source goals and initiatives (e.g. 18F at the GSAdata.gov and our own work with the U.S. House). The UK has done groundbreaking work at legislation.gov.uk that we could learn a lot from.

Here, I pause for disclosure of personal interest: the global policy pushed forward by these UK goals are very good news for my company, Xcential. As is the political movement that is growing in the UK to reinforce these open government policies. Because building technology for digital democracy is what we do. And I want to be a part of helping the UK meet its goals. I have long admired the groundbreaking work done by legislation.gov.uk, and see many overlaps with the work we are doing at the U.S. House and in other jurisdictions. Grant and I got a chance to work with hundreds of years-worth of UK laws, through the innovative "Good Law" hackathon  (we created an XQuery search tool for UK Public General Acts -- http://goodlaw.xcential.com/). And there is a great deal more the to be done with these laws, from web-based drafting to standards-based publishing online and on paper.

There are only 5 years before 2020, so we've got to get cracking. Talking about good public policy is fun. Implementing it is priceless.