Suppose that the good work of an analyst is hijacked or shared for use in ways it was never intended.
When I read the story in the Los Angeles Times, my mind drifted to those science/adventure novels where a researcher's laboratory work on a cure for a disease is turned against humanity by a villain.
The secondary use of Elliott's healthcare research doesn't threaten humanity. In fact, I don't want to judge whether it's reuse was good or bad, but it's a scenario worth remembering.
Working for the research firm Rand Corp., Elliott helped to develop a method to spot disparities in the healthcare provided to minority patients. He was on a team of Rand researchers who developed Bayesian Surname Improved Geocoding more than a decade ago.
Faced with the reality that insurers and providers don't always collect race data from their clients, the tool draws inferences on race based on matching last names with addresses. The algorithm assigns a percentage indicating the likelihood that an individual is white, black, Hispanic, or Asian. According to the LA Times article, the method for pairing two "knowns" -- name and address -- has been pretty effective at identifying an "unknown" -- someone's race -- in the healthcare role for which it was intended.
So, Elliott -- who has a PhD in statistics -- was quite surprised when a friend emailed him in 2013, saying, “Did you know you just cost Ally Financial $80 million?”
Ally wasn't even in the healthcare space, being the finance arm of General Motors. Ally paid that sum to settle a racial discrimination case brought by the federal Consumer Financial Protection Bureau. That agency has been using the Rand formula to identify patterns of racial discrimination in consumer lending in the auto industry.
The Rand tool also has been the focus of criticism by Republicans who claim it is flawed, or "junk science." My concern over the use and reuse of Elliott's tool has little to do with the CFPB actions on their own. The issue is one of what else might happen with the work of today's data science teams, looking a year or a decade down the road.
The business world, including non-profits, has so many thousands of analytics initiatives in progress that it's going to become tough to know how they are going to be used in the future. Algorithms can be shared in a number of ways. However, the reality is that some of those algorithms will be reused in inappropriate ways, perhaps harming reputations and leading to legal action against innocents -- maybe even the original authors of the algorithm.
All of us express concern about our data and how it is shared and misused. Now there is a real possibility that the tools used to analyze that data will be shared and misused as well.
Have you seen other cases like that of the Rand tool?