When Good Algorithms, Tech Stop Being Good


(Image: dencg/Shutterstock)

(Image: dencg/Shutterstock)

Back in August, All Analytics Editor in Chief Jim Connolly posted a blog that addressed the unpredicted use of Marc Elliott's algorithm, which was originally designed for medical research, but was also ultimately used to guess the race of people applying for credit (unbeknownst to the algorithm's creator). Jim's blog drew my attention because it is a great example of the potential onset of "technological iatrogenesis." Iatrogenesis is a medical term derived from Greek and it means "brought forth by the healer". Medically, it refers to the negative effects caused by medical or surgical procedures intended to help. In short, iatrogenesis refers to a cure that is worse than the disease.

In linking this concept to information technology in general, and for our purposes analytics in particular, technological iatrogenesis (TI) is when the deployment of a technical product, good, or service creates more problems than it eliminates. In that light, Jim's article caught my eye because it surfaced the risk that data scientists, analysts, statisticians, etc. could misapply algorithms and other tools under the assumption that one size fits all. Elliott designed his algorithm for one industry and to his surprise, it was being applied beyond the intended scope.

There are at least four ways that TI can occur. The first way assumes that a technology is impervious to time. Tape backups are a traditional way of storing mass quantities of data. And at one point, tape backups were a leading edge solution. But without migrating from tape (which degrades of time) at some point to digital or cloud service storage, historical business assets will be lost. Magnetic tape cured the storage problem in the past, but it inherently becomes a risk over time.

The second way that TI can occur assumes that embedded business rules are applicable across functional units, industries, or broad categories of analysis. For example, if the business rules of your system seasonally adjusts changes in employment based on the industry (retail store hiring may change when summer school students are available for the labor market), those same rules may produce deleterious results if you use them for seasonally adjusting changes based on geography. Rules based on what an establishment does are not suitable for adjustments based on where the establishment is located.

The third way that TI can occur assumes that data definitions have transitive properties and are vehicular in nature; this is the assumption that all terms are synonymous and are transferable across all systems. For example, a database variable labeled "hotness," which was designed for a dating service system would have a whole different meaning if used for a system designed to predict menopause symptoms. In like manner, a database variable labeled "nova," would have an astronomical meaning to a space scientist, a marketing meaning to a Chevrolet sales manager and quality meaning to automobile customers in Latin America.

The fourth way that TI can occur assumes that an IT solution designed for decentralized processing can be easily migrated into a centralized processing architecture. Decentralized systems are typically self-contained and have direct access to whatever processing resources are needed, whenever the resources are needed, for as long as the resources are needed. Centralized systems typically share some resources, even if configured as a series of virtual machines. Granted, centralized systems may be cheaper, but they work best when all of the processing nodes are intentionally designed in that manner from the start. Attempting to indiscriminately integrate a decentralized solution into a centralized architecture is not wise and will be more expensive than designing a "plug and play" architecture from the beginning.

In these days when organizations are seeking sharable and reusable solutions, it makes sense to avoid reinventing the wheel. But great care, reflection and investigation needs to occur to prevent the desired cures from being worse than the illness. While analytic professionals do not take the Hippocratic Oath, we should in some fashion observe this saying: Primum non nocere -- "First, do not harm".

What about you? Do you have any TI examples? Please share.

Bryan Beverly, Statistician, Bureau of Labor Statistics

Bryan K. Beverly is from Baltimore. He has a BA in sociology from Morgan State University and an MAS degree in IT management from Johns Hopkins University. His continuing education consists of project management training through the ESI International/George Washington University programs. He began his career in 1984, the same year he was introduced to SAS software. Over the course of nearly 30 years, he has used SAS for data processing, analytics, report generation, and application development on mainframes, mini-computers, and PCs. Bryan has worked in the private sector, public sector, and academia in the Baltimore/Washington region. His work initially focused on programming, but over the years has expanded into project management and business development. Bryan has participated in in-house SAS user groups and SAS user group conferences, and has published in SAS newsletters, as well as company-based newsletters. Over time, his publications have expanded from providing SAS technical tips to examining the sociological, philosophical, financial, and political contexts in which IT is deployed. He believes that the key to a successful IT career is to maintain your skills and think like the person who signs your paycheck.

Simple Tool to ID Data Fraud: Benford's Law

This simple, old school fraud detection tool for data can help weed out problems in a number of different scenarios.

The Three Social Functions of Business Jargon

The lingo or jargon you use in analytics or IT or business management is part of what makes you part of a culture. Here's a deeper look at the reasons why we use jargon.


Re: Losing data to time
  • 3/21/2017 12:27:05 PM
NO RATINGS

@Kq4ym - agreed. If data retention is mandated by law or litigation, or can be written off as a tax loss, then you know ahead of time. But otherwise - who knows?

Re: Losing data to time
  • 3/20/2017 8:49:51 AM
NO RATINGS

Yes, keeping old data and then making good use of it may very well be beneficial, but trying to figure out ahead of time which datat to keep and then testing to see if the return on investment is worth it can be a tricky piece of business.

Re: Losing data to time
  • 3/17/2017 11:33:45 PM
NO RATINGS

@Lyndon_H - The thalidomide crisis - yes - perfect example. That makes me remember the problems caused by DDT. I believe that economists and lawyers call those 'negative externalities' (as in The Tragedy of the Commons). I know of someone whose fingers were deformed because her mom took thalidomide.

Re: Losing data to time
  • 3/17/2017 4:59:50 PM
NO RATINGS

..

Kq4 writes


Doing no harm may be the goal, but when big chunks of money are involved, things often go astray. While "a cure that is worse than the disease," is nothing we intentionally plan, those plans get a bit out of hand when unbridled curiosity or the promise of great profits seem to be ahead. Perhaps social scientists should be involved along with data scientists in planning and program design?


 

My own thoughts go in a somewhat similar direction. When it comes down to it, just about any new technological development can be applied to harmful purposes – it all depends on whether individuals or agencies with maleficent intentions can get their hands on it and deploy it to those purposes. The airplane was a revolutionary invention that speeded passenger travel and mail delivery – but it also became one of the most formidable and horrific weapons ever devised. Likewise, lasers can be deployed as excellent surgical tools or fearsome weapons.

In my mind, a more clear-cut case of "technological iatrogenesis" would be something like a new anti-cancer vaccine that turns out to engender a different serious disease (say, ALS or Huntington's) in recipients. The thalidomide disaster might also fit in this category.

..

Re: Losing data to time
  • 3/17/2017 4:44:09 PM
NO RATINGS

@SethBL - So noted! To use a culinary metaphor - borrow the receipe but season to taste. Great 'cautionary tale' - thanks!

Re: Losing data to time
  • 3/17/2017 2:19:54 PM
NO RATINGS

I think every industry borrows technology from other industries and that includes algorithms.  When I do an industry analysis I'm always borrowing from other analyes but with the knowlege that it will have to be re-crafted over and over to fit it's new purpose.  This type of behavior should be encouraged because that is one of the ways research and technology grows.  However, it should only be done knowing that it is a new starting point and may not be as effective or have a desirable outcome.

Re: Losing data to time
  • 3/14/2017 10:27:38 AM
NO RATINGS

@kq4ym - Interesting suggestion! Certaintly having an interdisciplinary team brings multiple perspectives to the table and reduces the risk of TI. The challenge is to find the right number of people - too few and you don't have enough perspectives. But too many and you never get any work done. And to your point, when the leadership has tunnel vision on revenue, then wanting to see any other view is a hard selling point. But yes, I think your suggestion is great.

Re: Losing data to time
  • 3/14/2017 8:36:59 AM
NO RATINGS

Doing no harm may be the goal, but when big chunks of money are involved, things often go astray. While "a cure that is worse than the disease," is nothing we intentionally plan, those plans get a bit out of hand when unbridled curiosity or the promise of great profits seem to be ahead. Perhaps social scientists should be involved along with data scientists in planning and program design?

Re: Losing data to time
  • 3/13/2017 1:25:19 PM
NO RATINGS

@Jim,  Excellent point.  The complex system architectures of today (and associated multiple vendors who promise perfect system integration – but that is for another topic ...) are designed for high output, low costs, ease of use, and easy maintenance of the components. Hence DR is less of a concern, more so in the private sector.  As it relates to the feds, those agencies whose output/products are critical (akin to only essential personnel should come to work on snow days – which suggests that most people are not essential – but that is for another topic ... ) to the nation's welfare, do have DR plans in place. Actually the Labor Department has annual mandatory risk and security awareness training for federal and contract employees. So to your point, I am guessing that the degree of risk exposure, costs outweighing the benefits or the arrogance of thinking that nothing could go wrong is what is behind a decrease in the attention paid to DR in general.

Actually, your statements just raised an interesting question: There is so much talk about the dollar value in data, but if there is no DR plan in place, then are those valuation statements true?  Seems to me that if data is the goose that lays the golden eggs, then you might to store a DNA sample so that you could clone it. But please note that my career period in business development makes me suspicious of marketing language of the data evangelists. I am wondering if the lack of DR means that data assets are overvalued or that it won't be a concern unless a big problem occurs. Yeah – if your think of DR as data insurance, and if there is a decline in the implementation of DR, then then have to wonder if DR is overpriced or if the data assets are undervalued?

Thanks Jim – I think I smell a future post cooking.

Re: Losing data to time
  • 3/13/2017 12:02:55 PM
NO RATINGS

@Bryan. I haven't seen any data about it recently but I recall that a few years back research showed that the percentage of companies that actually ran true disaster recovery tests is low, like well below half. At one time when computing was centralized and you could have all of your key function backed up by a hot site I think the number of organizations that could run a true DR test probably was higher.

Imaging doing a companywide DR test today? While you probably could do it on a single application or function, doing so companywide would require testing recovery with many different services -- data feeds, SaaS apps, remote access systems etc. -- and so many different stakeholders and department level IT teams. It would seem that just planning would take so long that half the configurations and people would change before it was time to test.

Page 1 / 2   >   >>
INFORMATION RESOURCES
ANALYTICS IN ACTION
CARTERTOONS
VIEW ALL +
QUICK POLL
VIEW ALL +