If you want to learn about the process of getting a proposed bill passed, you can read the official explanation on a state senate site. It’s remarkably similar to the steps involved for federal legislation, according to the explanation offered to the protagonist of Mr. Smith Goes to Washington. What the explanations don’t reveal, however, are the entities behind the proposed legislation.
The actual authors of proposed legislation don’t sign their names, but they do leave signatures of a sort, the signals of individual style that can be found throughout their written work. All it takes is reading through thousands of proposed bills to find the textual clues that link bills to the same source. The only drawback is coming up with the time it takes for humans to read through it all. But this is one problem that technology can solve.
One of the presentations featured at Bloomberg's Data for Good Exchange was on developing an approach to data mining the text to identify the sources behind the bills. Applying technology to sift through masses of documents that would take humans thousands of hours to read through is the project that a group of five has been working on together on at the University of Chicago's Data Science for Social Good Program.
Sifting through each piece of legislation to find matches is far too time-consuming, and relying on Google doesn’t cut it because its results are not confined to legislation and do not bring up complete documents. A more specialized tool is needed for the focus on state legislation, one they call the Legislative Influence Detector (LID). In just seconds, it can search through complete documents and will only report on matches within the legislation category.
Explaining their approach, the researchers pointed out that the Smith-Waterman local-alignment algorithm was too slow to sift through so many texts. So they start with Elasticsearch to calculate Lucene scores. That narrows the texts to work with down to 100. Those are the ones that get compared to the document in question through a local-alignment algorithm. As it maintains the sequence of words, it is much more precise and accurate than a bag-of-words model.
On the basis of the matches uncovered by LID, reporters or interested parties can track special interest influence through the trail set by the matches. To illustrate the point, they show a screenshot of LID finding similarities between the Wisconsin Senate Bill 179 (2015), restricting abortions after 19 weeks of gestation and the Louisiana Senate Bill 593 (2012). The wording the two used is almost the same. Both bills would reflect a conservative agenda, though the group doesn’t point out which particular special interest is behind them.
The LID group admits certain shortcomings of the solution, such as the fact that it’s limited to the bills collected by the Sunlight Foundation, although those alone top half a million bills. But what they don’t point to is the possibility of political bias in the selection of bills they focus on. They say they use “2,400 pieces of model legislation written by lobbyists” that is largely based on the collection of ALEC Exposed, the organization devoted to depicting the American Legislative Exchange Council (ALEC) as a bastion of corrupt corporate influence on legislation supported by the GOP.
ALEC Exposed is part of The Center for Media and Democracy (CMD), which calls itself “a national media group that conducts in-depth investigations into corruption and the undue influence of corporations on media and democracy.” While that makes it sound completely objective, it generally is characterized as “liberal,” even “uber-liberal,” “left-wing,” and “anti-capitalist". So it’s not at all surprising that it would target the conservative ALEC. That is not to say that the data is incorrect, but that transparency should really be free of political party influence. Tools like LID only are truly “data for good” if they apply the same standards to all parties.
Have you ever tried to figure out who the backers of a piece of legislation might be? How did that go?
It would seem of the lawyers could cut down the length of all the sentences, most of which are dozens of words long, that in itself would halve the number of pages in most bills. I've noticed at our state level bills often are presented to local legislators at scheduled meetiings by the bills proponent. I would think there should be some record of who these lobbyists are who are writing the proposed legislation.
@Lyndon_Henry yes, it is a huge task. It took this group quite some time to get to this point. There is an opportunity to expand this with collaboration and, possibly, some volunteer labor to work on gathering bills and putting them into a format that is algoirthm-friendly. It could be a good project for college students that would be of interest to those who are interested in the legal process as well as those who want to work in data science.
As usual, Ariella has raised some extremely interesting issues with this blog post.
Ariella quotes the Chicago researchers:
Still, the data are not comprehensive. We cannot find matches for bills that failed to make their way into Sunlight's database...
I agree with what I think Ariella is saying — that it would be really, really fascinating to start having analyses of comprehensive databases of legislative bills, in a way that could track lobbyists' influence, especially spanning a variety of states. I'm really curious, for example, about how the influence of liberal lobbyists might compare with that of the rightwing lobbyists.
However, I'd also think accumulating such a database could be quite an arduous task. As the researchers also state,
Many states reintroduce nearly entire bills, sometimes automatically, which means the matches are strong and the algorithm is slow.
So you'd surely want to exclude mere repetitions of previous introductions of the same bills. And there are undoubtedly other problems. The data-cleaning challenge is huge. Just accumulating a comprehensive body of data sounds like a monumental project in itself.
@Lyndon_Henry I'm sure that even the group would admit that what they have so far is still work in progress. They have said as much here:
Still, the data are not comprehensive. We cannot find matches for bills that failed to make their way into Sunlight's database, nor can we use model legislation that is not public. We also miss bills rewritten to avoid detection -- although LID does make it more difficult for lobbyists and legislators to work in secret by forcing them to rewrite every time they want to introduce their legislation.
Finally, we have not yet run comparisons within states (for example, we do not compare an Illinois bill to past Illinois bills.) Many states reintroduce nearly entire bills, sometimes automatically, which means the matches are strong and the algorithm is slow.
I'm merely pointing out that in the expansion, their direction should not be limited by organizations whose very names indicate they're only interested in exposing one particular entity associated with the GOP. To be even-handed, any political agenda that gets snuck in should be brought to light -- even if it is the type of bill that the people behind ALECExposed would want passed. If they -- or only people sympathetic to their political choices -- are to be the gatekeepers on what is subject to data mining, that's not going to happen.
... if you stack the deck of data with what is given by an entity that clealry has an agenda against one party, it's hardly going to cover the influence that can be coming through lobbyists and foundations supporting the other party. My point is that political influence cuts both ways.
First of all, I agree that biased cherry-picking of data is an unacceptable methodology, and a database of legislative bills should be comprehensive and unbiased, so (in this case) influence from various political tendencies would be identified.
But, from Ariella's description, I don't see that such a principle was necessarily violated. According to Ariella's blog, the University of Chicago researchers used "2,400 pieces of model legislation written by lobbyists" that were "largely based on the collection of ALEC Exposed", a group characterized as politically liberal (it clearly has an agenda of "exposing" the conservative group ALEC).
A crucial question here is: How did the researchers with ALEC Exposed select these 2,400 pieces of legislation? I wouldn't pull an alarm just because of the group's political agenda. Methodology, in a sense, is everything. Did the university researchers examine that aspect? Were the 2,400 cases selected randomly, perhaps in a comprehensive sample, or via a biased process of cherry-picking?
Basically, if you disqualify research out of hand from researchers with "agendas" or political leanings (which has a kind of ad-hominem thrust), you're gonna eliminate virtually all research everywhere.
Obviously, research, to be convincing (and hopefully not embarrassing), must stand up to scrutiny of its methodology. Expecting to convince political leaders, the media, and the public of your assertions on the basis of clearly skewed data selection is just stupid.
@Lyndon_Henry certainly, but if you stack the deck of data with what is given by an entity that clealry has an agenda against one party, it's hardly going to cover the influence that can be coming through lobbyists and foundations supporting the other party. My point is that political influence cuts both ways. (That is certainly true of the way the climate debate has been framed, though I don't wish to digress.) Casting one party as the good guy and one as the bad is as naive as it is inaccurate.
I know people who will make a point of not voting for a Democrat, no matter what the candidate himself is about. I also know people who make a point of demonizing all Republicans. I don't know if corrupt people tend to go into politics or if the politics is what corrupts them, but generally you will not find anyone who is free of influence and some agenda in a political position. Personally, I'm not in favor of any party, and I can recognize both the right-wing bias of something like National Review and the left-wing bias of something like Salon. The problem is that people who are utterly immersed in their politics are blind to those biases.
ALEC Exposed is part of The Center for Media and Democracy (CMD), which calls itself "a national media group that conducts in-depth investigations into corruption and the undue influence of corporations on media and democracy." While that makes it sound completely objective, it generally is characterized as "liberal," even "uber-liberal," "left-wing," and "anti-capitalist". So it's not at all surprising that it would target the conservative ALEC. That is not to say that the data is incorrect, but that transparency should really be free of political party influence.
I imagine that most of the staff of the CMD vote for Democrats, but I don't see evidence that a political party (Democratic?) is actively influencing the CMD's research.
In any case, the political leanings of the researchers should be of less concern than the integrity of their research and other professional activity and the methodology they use to select, for example, the half-million legislative bills in the sample.
For the record ... my own impression is that research results are far less trustworthy from right-leaning organizations than from those considered more "left-leaning" (which these days means almost anything outside the GOP).
Let's also be aware that, for many on the extreme right, terms like ""liberal," "uber-liberal," "left-wing," and "anti-capitalist" are used to characterize, say, the world scientific community (in the context of the global climate change issue).
On the issue of ALEC ... Even without sophisticated algorithms, this conservative group's influence has been identified in boilerplate legislation (and other political intervention) promoted in various states, mainly politically GOP-leaning and swing states. One such issue that comes to mind is the flurry of Voter ID laws (with boilerplate wording) passed by GOP-controlled legislatures in a number of such states.
Social media and mobile devices may seem like a vast and scary unknown to parents of young children with phones and other devices. Now machine learning is being applied to the problem of protecting the kids.