Analytical Code: No Place for Secrecy

Whether you're producing analytics outputs or reviewing the results, you want to economize your efforts. You want to use the most concise and efficient code but still have the maximum information for understanding.

George Zipf, a linguist and a mathematical sociologist, identified this dynamic in his principle of least effort, which says that, in situations allowing alternatives, people choose the procedures that result in the least average rate of probable work. In data communications, bytes are bundled/coded into packets to minimize the amount of information the sender pushes through a channel. The receiver then must unbundle/decode that packet, so it can verify and validate the information with minimal effort. In presentations, it's not unusual to have speakers use a lot of acronyms: RUP, JAD, and SDLC. They do so to minimize the effort involved in saying "rationale unified process," "joint application development," and "system development life cycle." During the Q&A, audience members often ask the speakers to explain the acronyms.

As for analytics, if you're a coder who insists on minimizing code (because, for coders, nothing exists beyond efficiency and elegance), you should maximize internal documentation/comments and end-user/technical documentation. Keeping your code logic wrapped up in a shroud of mystery may make the program execute faster -- and be your way of assuring job security -- but you have to make it easy for someone to understand your stuff, in case you become unavailable for explanations.

Process diagrams are one way of bringing balance to the Force between analytic authors and audiences. These diagrams can be effective because seeing the paths of black box (inputs and outputs) and white box (detailed view of the data transformation steps) eliminates the struggle between those who want to say less and those who need to hear more. Though a complex diagram carries the risk of being too busy, it still makes it easier for analysts to express and the audience to understand vis--vis words alone.

Where possible, I recommend starting out with a diagrammatic representation and then presenting text/code. When visual common ground is established, the senders and receivers will be less likely to get angry with each other because the words became lost in translation.

Sometimes I am ROFL when I see acronyms used as boundary maintenance mechanisms by the self-described cognoscenti and the analytics intelligentsia. The attempt to confuse your audience with esoteric abbreviations and terms can escalate the perception of your IQ and discourage others from asking questions.

BTW, what do you think? Does Zipf's theory mean that people -- senders and receivers -- are basically lazy? Can analysts communicate clearly with those in the C-suite? WDYT?

Bryan Beverly, Statistician, Bureau of Labor Statistics

Bryan K. Beverly is from Baltimore. He has a BA in sociology from Morgan State University and an MAS degree in IT management from Johns Hopkins University. His continuing education consists of project management training through the ESI International/George Washington University programs. He began his career in 1984, the same year he was introduced to SAS software. Over the course of nearly 30 years, he has used SAS for data processing, analytics, report generation, and application development on mainframes, mini-computers, and PCs. Bryan has worked in the private sector, public sector, and academia in the Baltimore/Washington region. His work initially focused on programming, but over the years has expanded into project management and business development. Bryan has participated in in-house SAS user groups and SAS user group conferences, and has published in SAS newsletters, as well as company-based newsletters. Over time, his publications have expanded from providing SAS technical tips to examining the sociological, philosophical, financial, and political contexts in which IT is deployed. He believes that the key to a successful IT career is to maintain your skills and think like the person who signs your paycheck.

Capta: The Data of Conscious Experience

Phenomenological researchers say "capta" is the "data of the conscious experience." Is there room for this kind of data in analytics? How should analytics pros use it?

Regulatory Oversight vs. Crowdsourcing: The Best Approach for Quality

If you are looking for data quality, should you rely on professionals or passionate amateurs? Here are the pros and cons.

Re: Coder turned tech writer
  • 1/17/2013 7:38:57 PM

Kicheko, Discussing code responsibility also opens the discussion with clients as to what it takes to execute an analytics-related project.  I am not sure it is a bad thing for one person to manage code if the person undertands the project and its scope for a client. But there is a breaking point where a team needs to be responsible rather than a single source. The need to understand scope is a challenging discussion - some things are unknowns - but it's a worthwhile discussion to set expectations as much as possible.

Re: Secrecy kills
  • 12/18/2012 11:03:49 AM

MNorth - great piece and on target. Like in the Wizard of Oz - 'don't look behind the curtain!'  There are sociological and psychological dimensions to analytics - in a nut shell - we are insecure beings - collectively and individually.  Hope we can move toward transparency and trust.  As in the film - Wizards can be useful even when we are completly open and honest.  Until then - its still a 'numbers game'.

Re: Coder turned tech writer
  • 12/18/2012 10:58:09 AM


Agree wholeheartedly!  But in this day, it still happens.  This is very risky, but some shops become so comfortable with the flow of activity over a number of years that they forget that sudden and unexpected transitions can happen.  Some people want to be the keeper of the secret code and their managers will let them. After all - what could happen????

Re: Coder turned tech writer
  • 12/17/2012 8:43:31 AM

Brian, - IMO it is always a bad idea to be the single person that knows and understands your code. The best code is written in teams. Even as a customer i wouldn't go for one-man code that is undocumented because when the programmer dies(worst case scenario) or is out of country and we have an emergency, that will be it with that system.

Secrecy kills
  • 12/14/2012 3:03:30 PM

In my piece on the Bowl Championship Series last week, I mentioned that one of the greatest criticisms is that the BCS formula uses computer models that are not open for public scrutiny.  The computers analyze the data and spit out these rankings, but no one can determine what happened in the code to decide if the rankings mean anything useful.

Re: Coder turned tech writer
  • 12/14/2012 1:33:37 PM

Hi Beth,

The standard IT answer is - 'it depends'.  Typically a coder will/should internally document the program (header template with in-line and block comments); but this is assuming that s/he works in a structured environment where there is turn-over and that someone else may maintain the code down the road.  In shops where a coder is 'fire-proof', works alone and assumes s/he will stay/live forever, there may not be as much internal coding because the functionality is committed to memory. And as long as you are the only person who understands the code, your job is safe. You protect your knowledge of the code as a Jedi protects a lightsaber.

A coder may also do the end-user or technical documentation, but it depends if this is a small company where the coder does the work of three people, or a company that can afford a business analyst or tech writer. If the company is a contractor, they will have a separate person for external documentation - each additional person represents revenue.  If they are smart, they will bill for a business analyst, requirements person, tester and tech writer to create a paper trail outside of the code itself. But beyond the revenue value, it does help to have someone to ensure that the internal and external documentation is useful, usable and used.

Another factor is whether English is the person's primary language. For the sake of revenue enhancement, I have seen staff augmentation with an emphasis on capturing off-shore talent.  These employees are not always comfortable writing down comments for fear that their written English skills will expose them or get them labeled as expendable. Hence, being hesitant to provide anything beyond very basic statements does not mean a lack of conscientiousness, but the need to get comfortable in putting one's words in the public square.

So yes Beth 'it depends'.

Coder turned tech writer
  • 12/14/2012 10:34:11 AM

Bryan, interesting advice! Do you find that coders are typically writing the document around their work or working with somebody else who can do that for them? I'm thinking a coder might not always be the best person to communicate that info.