Wednesday, December 7, 2011
So it should be no surprise that I am intrigued by the esoteric post about this battle by John Barker of Wolters Kluwer (a large legal publisher and owner of CCH, an information service for tax professionals). Barker argues that CCH's product, backed by legal experts who understand the context of arguments in a tax law case, is better positioned than Google Scholar to provide a meaningful result for professional users. Edward Bryant, one of those experts at CCH, writes on his personal blog that the tradeoff between expertise and automation is foremost a question of money: "(1) is an automated algorithm cheaper and (2) is the accuracy level I get through automation acceptable to my customers"?
I have understood these tradeoffs most clearly in discussions with Itai Gurari, formerly of Google Scholar, creator of Tracelaw, and one of the world's experts in automation of caselaw analysis. What Itai does is really hard, and there aren't many people who can do it: while caselaw has an underlying structure due to its logical place in legal canon, finding that structure is an intrinsically hard problem to automate, because every judge thinks he's a poet, novelist or comedian. True experts in a given area of caselaw are rare. Experts who can formalize their knowledge into computer code are even more rare, so automation has a pretty large up-front cost. And there are many subtleties that automated analysis will miss, no matter how many expert hours went into building it.
Where automation shines --or more broadly, computer assisted information processing-- is in the display and facilitated navigation of information. Yes, experts' hand-drawn maps have been useful for a very long time, but even the Thomas Guide (remember those) has given way to Google Maps for most uses.
And that's where we're going with tax26.com: the tax map feature is just the tip of the iceberg of what can be done to map the geography of legal information. Not that this will replace experts any time soon, or my 5-year old will have to come up with a new title for me.
Wednesday, November 30, 2011
Grant asks a number of thought-provoking questions about creating a uniform semantic web for legal documents, which I think need to be addressed by the legal technology community, and the broader legal community:
What standards would be required? What services would be required?...Should the legal entities that are sources of law assume responsibility for publishing legal documents or should this be left to third party providers?Edward Bryant takes a different approach in his post about using data in law, focusing on the value of using data to make policy decisions, which are then implemented in laws or regulations. He discusses recommendations by the Ohio tax board to streamline processing of challenges to the state authority's valuations of residential property. The tax board apparently recommends streamlining challenges on residential property, but not commercial property, on the assumption that the residential claims will be less complex. Bryant points out that the board's recommendation would be more credible if it used a little bit of data to correlate case complexity with the type or amount of claim.
I would expand Bryant's point to suggest that many of our leading decision-makers are not equipped to make data-driven decisions. Often, policy decisions like this are made with little or no relevant data-- or even in the face of contrary data. Requiring some data-intensive technical training for lawyers would be a good start. How about one semester of Evidence that focused, not on the FRE, but on how to gather and evaluate objective evidence in support of policy or legal decisions? I suspect that if lawyers, in general, were more data-literate, we'd have an easier time answering the questions that Grant poses above, on the way to create a uniform semantic web for law.
Monday, November 28, 2011
html2text -nobs -ascii -width 200 -style pretty -o filename.txt - < filename.html
So now that you know my current answer, here's the problem: not all html is created equal. Legislative data is published in a variety of formats, some uglier than others. Extracting information requires cleaning these formats up.
When I started to work with California legislation, I had the problem of converting the state's plain text into a simple html for use on web pages. To do that, I used a Perl text2html module. While it takes many steps to produce web-friendly html from California's laws, at least the plain text was not cluttered with formatting symbols and tags that could interfere with the core text.
The problems in other states is far worse. Some versions of Iowa's bills, for example, appear to be published directly from Microsoft Word to the web, which means that they're littered with a maze of formatting information--sometimes positioning each word on the page--that is not related to the text of the bill. Other states use hundreds of cells of an html table (or multiple tables) to format the bill. Looking at the file on the state's website, you wouldn't know that the underlying data is so messy.
Simply stripping all of the html tags won't work, because that eliminates all the formatting information, including information that can change the bill's meaning (spaces, paragraphs). That's unfortunate, because there are many html libraries that would make stripping out the tags easy (e.g. Beautiful Soup for python, or similar libraries in other languages). What I want to do is preserve the formatting, but do it with spaces and paragraphs, not tables or graphically positioning words.
Ironically, the most effective way to clean this messy data is also the easiest: copy and paste the bill displayed on your web browser. After all, the formatting was made for the browser to interpret, and the copy-paste function (at least on a Mac) is quite faithful to the formatting. However, automating this copy and paste process is far from simple and, with one exception, I have not seen any programs that make use of this native browser capability to convert files in bulk. The exception is the use of the Linux web browser, Lynx, which has a function "Lynx -dump". However, this converter apparently has a number of faults, including an inability to process tables. Anyone know how to use Chrome or Firefox to automate conversion of html to text for large numbers of files? This is still the solution I'd prefer.
But barring that, I found a close second, in the form of the html2text program. Although it's relatively old (2004), it's fast and deals reasonably with tables and other formatting such as underlining and strikeouts.
Edit: Upon the suggestion by Frank Bennett, below, I installed the w3m text browser and used it to produce formatted text from html using the following command-line syntax:
w3m filename.html -dump > file.txtLike html2text, it is fast and produces clean output, actually somewhat too clean. The saved file strips some important formatting information, like <u> (underline) tags, so some caution is in order when using this method.
Friday, November 18, 2011
Wouldn't it be nice if governments at all levels would collaborate to create a single nationwide public domain data standard for legislation? That would, for example, make it easier to identify all state laws related to abortion or to compare education laws across jurisdictions. It might be nice, but it's also less likely than the Congressional SuperCommittee reaching a compromise. I won't be holding my breath.
Monday, November 14, 2011
Grant's vision, which I share, is that at some point, legislation from around the world will be published in a standard format so that "you or your business can easily research the laws to which you are subject" due to the growth of an industry that "caters to the needs of the legal profession based on open worldwide standards."
There are a number of questions of how that vision will come to be. I touched on some of these questions in my answer on Quora about the non-technical barriers to using version control for legislation, which stimulated a lively discussion. I'm hopeful, with Grant's new blog, that we can have more of those discussions to work out both the non-technical (mostly political) and technical challenges in the way of open legislative data standards.
Thursday, October 13, 2011
Sunday, October 9, 2011
Saturday, October 1, 2011
Thursday, September 22, 2011
How do I know? The Google Analytics report for this blog showed that phrase as one that led here.
So if you're wondering, my 2 cents on the subject: one great use for the internet is to make legal information more accessible. While I think that the internet can be valuable to share photos, videos, tweets, tumbles, sparks and other gems, it can also be used to share our basic legal rulebooks and court decisions in a way that is accessible to everyone who is bound by them.
A small core of hackers, consisting of Grant Vergottini, Greg Willson, Mike Tahani and myself (with support from Karen Suhaka's excellent team at BillTrack50) is moving forward to apply the sample timeline to all sections of California's codes, and to link external data to code sections. In particular, Matt has written functions to link Maplight.org's lobbyist and bill positions data to California statutes.
Meanwhile, we're working with Common Cause (Philip Ung), Sunlight Foundation (Laurenellen McCann) and Maplight (Jeff ErnstFriedman and team) to debrief hackathon results and apply this momentum to strengthen Open Government initiatives in California.
If you want to chip in, contact me, or add to the growing wiki here.
Sunday, September 18, 2011
Special guest appearance by John Sheridan, architect of legislation.gov.uk and guidance on California data APIs from Grant Vergottini, architect of California's LegisWeb and legix.info, and the Maplight team. Amazing cross-country coordination and promotion by Robert Richards.
Participants, photos, thanks, some of my embarrassing source code and early results are on the wiki and will continue to be updated:
Improved documentation of the event and access to California legislative data coming soon.
Wednesday, September 14, 2011
Join us this Saturday to hack world-class apps for California's legislation. This hackathon was born on the Sunlight Foundation's Open State project listserve, to extend the great work there! Please forward to your lists and groups!
Monday, September 12, 2011
Do you have ideas for the hackathon? Add them here*:
Current ideas include:
- Cluster related code sections, for search and navigation
- Create a timeline view for each code section
- Bulk downloads for codes and legislation
- Create identifiers for useful legislative units (e.g. language on "unfair practices")
- Track movement of statutory text from one place in the code to another
Thursday, September 1, 2011
Pressures on lawyers and law firms to become more efficient, and to adopt advances in technology are now becoming publicly visible in a number of ways. One of them is the rise in online legal services that Hane describes in her article, another is the turmoil surrounding high law school tuitions and the weak market for new lawyers, a third is the growing interest in legal information from technologists and technology companies (e.g. legal content on Google Scholar and Google Venture's investments in LawPivot and RocketLawyer).
These changes highlight two essential components of law: information and judgment. A comment on judgment first:
Engineers often make the mistake of assuming that the entire function of law can be outsourced to technology. That thinking is fed by a certain line of thinking that runs straight up to the Supreme Court, that judgment is just the application of law to facts, like Chief Justice Roberts' "balls and strikes" analogy at his confirmation hearing. That suggests that judgment can be replaced by an algorithm. That, I hope, is not the direction of improved legal technology.
Where legal technology shines is in distilling information in a form that makes it easier for a decision maker to apply good judgment, and which clears out much of the information overload that surrounds many legal issues. The tech world is only now touching the surface of what can be done to distill the information of law, which is just text after all. As an example, a friend of mine, Itai Gurari, is building an engine that can identify the relevant legal points in a court opinion (check out his search engine, Tracelaw, here). If you want to get involved in this exciting field, a good place to start is with state statutes and by helping us with the first ever California Law Hackathon.
Wednesday, August 31, 2011
The California Codes Hackathon, now scheduled for September 17 in San Francisco and Denver, is gathering steam.
In addition to the excellent hackers who've signed up to prepare data for the hackathon (check out the wiki), supporters now include:
Sunlight Foundation (thanks, Laurenellen McCann and James Turk), Common Cause (Philip Ung), Nation Builder (Adriel Hampton). We'll be announcing more soon...
Sign up now to join us: Facebook Event or through NationBuilder (same list).
Monday, August 22, 2011
Thursday, August 18, 2011
Also wanted to note that my California laws site (calaw.tabulaw.com) is down, due to an Amazon Web Services interruption. I will restore it soon on a new AWS instance. If you're interested in such things, the notice is below the fold.
Thursday, August 11, 2011
It's true, California's laws now come with an unofficial RESTful API. This is a great boost for California Law Hackathon plans: now programmers can dive right in and develop innovative ways of presenting and navigating the data in their favorite format- JSON, XML, RDF among others. If you want to jump ahead to see the API specifications, they are available on the legix.info site here and I've posted them here on the California Law Hackathon wiki.
- Target date for the hackathon: September
- Prepare data and tools for hackathon participants
- Prepare a list of projects and goals (e.g. legislative time-machine, before and after redlining for bills)
Wednesday, August 3, 2011
I've set up a collaborative editor at http://rasa.tabulaw.com/p/calaw so we can jot notes for the planning meeting itself.
Look for more details on govfresh.com, where Luke Fretwell has generously offered to publicize the meeting.
Monday, August 1, 2011
I've written before about my efforts to wrangle text files of California's codes into structured data that is easier to navigate than the official site (leginfo.ca.gov). I've posted the new version of California's codes on calaw.tabulaw.com, and made the computer code available on github. Now, California's Legislative counsel has made the raw data, in XML format, available for FTP download here. What is remarkable, is that the ftp data comes with all of the SQL scripts and a guide to set up your own database of California's laws and bills, *updated each day*.
2. Show a redlined version of California's Codes, for any bill that would amend them.
3. Immediately update California's Codes when a new bill is passed.
4. Feature modern search and navigational tools to smoothly get from any place in the codes to any other.
Monday, July 25, 2011
One answer for graduates, as I suggested, is to proactively define and promote their own expertise. Another is for this generation of new graduates to join with other new graduates and use technology to their advantage. This generation may face the worst economic situation for lawyers since the 1930's, and are largely being shut out of traditional firms. But they are also (by definition) the most wired generation ever and have access to technologies that can bring tremendous value and efficiency to legal practice. These technologies include:
1. Social networking to bring in business from around the corner and around the globe.
2. Better, inexpensive and free online research platforms.
3. Virtual law firms, which can lower overhead and increase transparency by providing links to the public work product of firm lawyers, and facilitate rating or referrals from clients.
4. Workflow technologies to offer better and more efficient service to clients.
We're working on a couple of these technologies (see tabulaw.com, tax26.com) and I believe that the next couple of years will bring many more.
If you are part of this new generation of lawyers, what role do you see for technology in law? What technologies would you like to see for lawyers?
Friday, July 22, 2011
David Lat, founder of Above the Law, argues for replacing the third year of law school with the beginning of an apprenticeship. Perhaps not surprisingly, but somewhat disappointing, three law school professors say that the current model is fine and that law school is a good opportunity, regardless of career prospects, to become "citizen scholars" (with > $120k in debt?).
This discussion has mirrored many conversations I have had recently with lawyers and Bay Area entrepreneurs who are looking not just at law school, but at the creaking wheels of law, and are working on ways to innovate. Just this afternoon, I met with Tim Hwang, U.C. Berkeley Law student and partner in the fictional Robot, Robot & Hwang law firm. Earlier this year, Hwang organized a conference of technologists and entrepreneurs in the legal space, and is currently working on projects that reinvent the relationship between law and technology. We spoke about the generation of lawyers who are not following the traditional path from school into law firms--whether by choice or, more often, due to the downturn in the economy. Will this generation just disappear, or will it push for a transformation of legal practice?
I've also been speaking with Vivi Hoang, a recent law school graduate, who is working on an innovative plan to develop a startup law clinic at a Bay Area law school that would bring together students with entrepreneurs and law firms in this area. This would seem to be a win-win all around: law students would get hands-on experience with legal issues that arise in startups; startups would be able to trim their legal bills, and participating law firms would be able to work with promising startups without taking on the full risk of a fee deferral arrangement.
Perhaps most encouraging was a meeting I had with Avlok Kohli and Kevin O'Keefe this morning, during Kevin's brief stop in the Bay Area from Seattle. Avlok is a brilliant strategist, software engineer and co-founder of a startup in the legal space that I've been working with. Kevin is the founder of the LexBlog network (tagline: "Real Lawyers Have Blogs") and has built a considerable following by helping lawyers to develop a thoughtful and effective approach to social media. Kevin, like many of the others who I've spoken with recently, recognizes the need for a cultural shift among lawyers-- in the way we communicate with each other and with the public, and in our use of technology. The skills that Kevin has emphasized for practicing lawyers--to develop and share their expertise in blogs and elsewhere online--is even more critical for the current generation of law school graduates. These graduates will have to fend for themselves more and more, as the nature of the legal market and legal services inevitably change.
So while law students wait for the reforms that David Lat and others are calling for in the New York Times OpEds, they would do well to follow Kevin's advice and start now building their individual reputation online. In other words: Learn to stop worrying and love to blog.
Tuesday, July 12, 2011
Update: Upon further investigation, it seems that Congress has now repealed (at least one) section 139D. An update on the House website notes that "Section repealed by Pub. L. 112-10, sec. 1858(b)(2)(A)". This update raises its own questions, since the bill that became Pub. L. 112-110 (HR 1473), does not seem to have a section 1858.
Friday, July 8, 2011
It does look, however, like reform of many provisions of the tax code will be central to securing a deal, in which case the changes can serve as a kind of dry run for the larger overhaul that President Obama called for in his State of the Union Address.
A great opportunity to simplify the tax code not only by closing loopholes, but also by writing any new tax legislation so that humans can better understand it and computers can better process it. See recommendations elsewhere on this blog and at 21stcenturytaxation.com: start by using plain language in writing any new tax legislation.
Wednesday, June 29, 2011
Check out the post and leave your comments here: http://21stcenturytaxation.blogspot.com/2011/06/tax-law-access-in-21st-century-guest.html
Monday, June 27, 2011
If you squint, you might be able to find a couple of intersections, but not many. I think that this is a problem that can be solved largely by providing a clean, obvious, technical solution for lawmakers. To borrow from the Godfather: offer legislators a solution they can't refuse (more below).
But this question asks about the non-technical* barriers, and these are largely inertial. The legal community is unaware of the powerful text-based tools that could make legal work more accessible to the public and more efficient. Meanwhile, there is no "version control" lobby in Congress. So although adding version control would make a tremendous difference to the efficiency of the legal process, few people understand the value that it would bring. I've written about the potential benefits in a couple of specific cases: and
Much of the current system for drafting, publishing and updating U.S. laws is more than two hundred years old, depending on how you count. It is internally consistent (mostly) and is actually quite sensible for organizing legislation into printed books.**
In the case of U.S. Federal legislation, the significant burden of writing, compiling and publishing U.S. laws is divided among three different institutions: the Office of Legislative Counsel of the U.S. House is in charge of formatting and printing legislative drafts and proposed legislation; the Law Revision Counsel of the U.S. House maintains and updates the U.S. Code on a 6 year schedule***; and the Government Printing Office is in charge of printing the official version of the U.S. Code. When these roles were originally established, they provided the human resources and Quality Assurance to maintain an organized body of law. The challenge is to move from this system to one that is suited for an electronic age.
Each of the three institutions works with legislation in a different primary format. Where metadata has been added, e.g. to create an XML or HTML version, the formats are not consistent with each other. This is a technical barrier that will require a non-technical solution (choosing one format and responsible institution over the other). It's a question of awareness and political will.
This year has seen some progress on both counts. Just a couple of months ago, Speaker of the House John Boehner and majority leader Eric Cantor wrote a letter to the Clerk of the House, calling for e-formats for legislation.**** The Sunlight Foundation has been doing great work in pushing for transparency in government, including more consistency in e-formats for legislation.
This is where I think a technical solution (and technical people) can make a difference. We can develop a solution that "just works": showing a redlined version of laws for any bill, accurately showing changes in the U.S. Code as soon as an amendment is enacted, and browsing of legislative history like the MacOS Time Machine. A non-partisan solution that could save money and increase transparency, all at the push of a button. I still wouldn't underestimate the power of inertia, but having an elegant and simple technical solution close at hand will make it much more likely that legislators will make the change.
*By "technical" I assume the question refers to the algorithm that would actually be used to implement version control, and "non-technical", I assume, means the political or historical resistance to change.
**Legislators, and the legal community as a whole, has yet to make the transition from print-centered formatting to electronic. Legal documents--even if originated and consumed electronically--are still formatted as if destined primarily for print.
***The U.S. Code is a compilation of U.S. Federal laws into 50 Titles, divided by subject area. http://www.house.gov/hous
****I highlight this letter, and some of the technical challenges to converting legislation into a version-control friendly formats, on my blog:http://blog.tabulaw.com/2
Thursday, June 23, 2011
[UPDATE: I have shut down the live CA Laws demo website; legix.info provides the internal hyperlinks that I had built into my site, and is kept up-to-date. The Android app is also not working now.] Download the new California Laws app here for free, test it out and let me know what you think. To install, you need to download directly to your Android device and open from the System tray.
A few downsides which can be cured in future versions:
- Tables of contents require scrolling across the screen
- Appgeyser puts an ad at the bottom of the application for their service
Wednesday, June 22, 2011
I also implemented an idea by Jason Wilson, and seconded (or at least retweeted) by Robert Richards, to add headings to California law sections, to help provide context. Wilson, of Jones McClure, a legal publisher, has given a lot of thought to legal technology and has many interesting ideas on how to make legal technology better. His suggestion on the California Laws site is just the kind of exchange I was hoping to generate. If you have ideas or suggestions to improve navigation of California's laws (or the Internal Revenue Code), let me know on twitter (@arihersh or @tabulaw) or in comments below.
In future posts I will flesh out details of how this could work, in the context of California law and in open sourcing the Internal Revenue Code.
Friday, June 17, 2011
- Introduce meaningful metadata into the text.
- Parse or draft new tax-related bills in so that they can be:
- instantly compared to existing laws and, when passed,
- used to immediately update a public, online version of the new law.
- Create an platform that experts and professionals can use to research, debate and explain the law.
*The LRC version is up-to-date through January 2011.
Thursday, June 9, 2011
|1.||Write in Plain English||Mistweet constituents|
|2.||Make Laws Web-Friendly||Cut Funding for Transparency|
|3.||Make Court Opinions Accessible||Seek Love on Craigslist|
|4.||Invest in Science and Tech Education||Add Facebook and Google to DHS|
|5.||Keep Your Shirt On||See Link in Previous Column|
Monday, June 6, 2011
I've made some improvements to calaw.tabulaw.com, which has all of California's legal codes with internal links for easy navigation of the laws. It now also has a fast search engine, powered by Sphinx.
Know anyone who works with California state law? Pass this on to them. Anyone in the legislature? They might want to replace the aging leginfo.ca.gov...
Wednesday, June 1, 2011
I found more than a thousand errors in the course of parsing the online version of California's legal codes. At first, I thought there might be something wrong with my parsing algorithms -- I had, indeed, gone through a number of rounds of bug-fixing. These repeated sections were carried over to the site I've published (calaw.tabulaw.com). Having parsed the sections, it would take just a few minutes to clean up the duplicates, but just to make sure I looked back at the California legislature's website.
When I looked at the original data on the California legislature's website, I saw the sections repeated verbatim. I've collected the 1,368 repeated sections (about 2%), and most look like errors in California's original conversion from print to electronic document.
Want to see for yourself? Check out these sample sections:
Ý1084.] Section Ten Hundred and Eighty-four. The writ of mandamusmay be denominated a writ of mandate.
Friday, May 27, 2011
For this task, I went back to another old Linux utility: Find. If you type "Find /" from a command prompt in Linux (also MacOS), you get a list of all of the files and folders on your computer. Don't do this. It will take a long time, and is not really useful for anything. But you can use this powerful command within a single directory, and send the list of file names to a program that will operate on each one. In this case, I wrapped this all in a Python program, using the POpen() function to run any Linux commands that I wanted. Gory details below the fold.
|CA Codes After|
If you want to skip the details and go straight to the results, I've put the newly transformed California code sections on a website (calaw.tabulaw.com). Currently, the design is very simple and has no styling, whatsoever. But I welcome you to do a before and after comparison and let me know what you think in the comments.
In my view, converting CA Legislation to structured data makes navigating the code much easier. It also reveals some problems with the version on California's website-- repeated sections, stray text markings--that should probably be cleaned up. More about these anomalies, and the brave new world that structured data can bring to law, in future posts.
Tuesday, May 24, 2011
"pursuant to the provisions of Part 2.5 (commencing with Section 18901) of Division 13 of the Health and Safety Code"
s_Health and Safety Code_<a href="/Code-hsc">Health and Safety Code</a>_
pursuant to Chapter 3.5 (commencing with Section
11340), Chapter 4 (commencing with Section 11370), or Chapter 5
(commencing with Section 11500), of Part 1 of Division 3 of Title 2
of the Government Code
of the <a href="/Code-gov">Government Code</a>
Monday, May 23, 2011
How to find section headings in a text document and convert them to targets for hyperlinks?
If you have ever had this burning question, you'll want to read on. Or you can take my word for it that it would have been better for this information to be included in the documents when they were originally published.
This post describes Step 2 of 5 to convert California statutes to structured html: Identify section, subsection and subdivision headings. To do this, I am using an old (1970s) Linux program called "sed" (stream editor).
There are lots of ways to do this using more modern programming languages, but sed has the advantages that it is VERY fast, and it has built in the operations of opening, editing and closing a file. It's basically a "find and replace" function on steroids, without the need for Congressional hearings.I must admit, that once I got the hang of sed, and its improved cousin, "Super Sed", it was pretty addictive: with one command, you can change all capital letters in a document to lower case, or replace all vowels with a *, or mark all numbers and letters at the beginning of a paragraph as section and subsection headings. Sed goes through a file one line at a time and makes these substitutions. Sed is quite powerful and there are actually a number of other things you can do with sed, operating one line at a time through a text. If this sounds like fun to you, look here for a good tutorial.
I was working with California state statutes, which I had earlier converted to html. Fortunately, the statute text has a very regular structure: sections, subdivisions and other levels of the document were marked at the beginning of lines, with consistent spacing setting them apart.
So to find the section headings, I just needed to create a set of rules (using RegEx), that describe each kind of section heading. California statutes use headings with the following levels:
100.1 (a) (1)
So I needed to describe each of these section headings in a way that they could be identified and separated from any other numbers and letters that are found within the statutes. Here's an example of a rule that does this:
s_^<p>([1-9]\d*)\._<p><span class="section level1" id="sec-\1\.">\1\.<\/span>_
It looks gory, but is actually pretty tame. In essence, it says to substitute (s_) any number at the beginning of a line (^) and beginning of a paragraph (<p>) with a label (<span>) that will identify this number as a section heading. Each kind of heading requires another rule to describe it, and then all of these rules are applied to the file using the ssed (Super Sed) command. The result converts a section heading like this:
<p>15210. Notwithstanding any other provision of this code, as used in
to something like this:
<p><span class="section level1" id="sec15210.">15210.</span> Notwithstanding any other provision of this code, as used in
Not rocket science, but one step closer to structured data. The <span> will allow us to separate out this section from the rest of the text in order, for example, to link to this section from another section that references it.
The next step is to find all of the references to other sections that are found inside the statute text and to place links from those references to the sections they refer to. Unfortunately, those references may cross over more than one line, it is harder to use a line-by-line editor such as sed to do the job. For this, I put together a short search and replace program in the Python programming language, which is more flexible and has a lot of tools to for working with text. That will be step 3 in the 5 step process, for a future post.
As I mentioned earlier, I will be publishing the final scripts on Github, and will be publishing the hyperlinked version of California legislative information. And hopefully this can inspire California's legislature to publish the statutes in a structured data format to begin with, which can be combined with the OpenStates data to make it easier to see the changes that would be made by any proposed legislation.