Thursday, February 26, 2009

Where do I get my Data?

With respect to the question of where I get my data, the short answer is that I get it off of the State Legislature's website. So I have to manipulate the data -- which is normally a flat text file either in HTML or PDF format -- to the point where I can get it into a spreadsheet or database: one record per line-item. I also download materials made available by the Governor's office on his website which, thankfully, includes a number of useful Excel spreadsheets that can be downloaded.

Naturally, the data that I have is only as good as the data that the Legislature and Governor publish on the web. While most of that data is reasonably good, there are occasionally substantial errors that require quite a bit of work to sort out. So, take for example, the FY 2004 Budget, enacted as Chapter 26 of the Acts of 2003.

This particular document has an alarming number of errors in it. The problems include the following:
  • First of all, the document is missing hundreds of line-items, which simply did not make it into the published document. For example, 0320-0003, the line-item for the Supreme Judicial Court is nowhere to be found.

  • Second, the document repeats 169 line-items exactly as they were before. So, the document goes up to line-item4000-0300 and then begins again at line-item 0610-0093 (Bonus Payments to Persian Gulf War Veterans) which, naturally enough, appears twice.
This does raise the question, who proofs this stuff? That's not a minor problem in the document, which by the way, is still uncorrected almost six years later. The only way that I was able to make these determinations is by finding the Conference Committee report for that year importing that budget, and then comparing the differences.

But don't take my word for it: check it yourself. See if you can find any of the following line-items:
  • 0321-0001: Commission on Judicial Conduct
  • 0321-0100: Board of Bar Examiners
  • 0321-1500: Committee for Public Counsel Services
  • 4510-0113: DPH Office of Rural Health
  • 7002-0500: Division of Industrial Accidents
  • 7002-0600: Labor Relations Commission
  • 8000-0105: Office of the Chief Medical Examiner
  • 8000-0106: State Police Crime Laboratory

You can find them if you go to the Conference Committee report (House 4004) and look for them there.

No comments:

Post a Comment