- by Lyndon_Henry, Blogger
- 10/30/2013 12:51:36 PM
Can you imagine how much more of a weapon the data becomes in bigger, more notably politically corrupt cities live here in Chicago where I live? On the upside, it sounds like you've got a great many challenges to tackle, which will keep you busy!
I'm somewhat aware of the problems in Chicago -- one of the members of my transit discussion forum is a community transit activist there and relates his own problems struggling with bad transit decisions motivated by incompetence and political influences.
I'm sure we'd all like to have firm confidence in government and transit agency officials, and their competence in planning for our communities, but unfortunately this isn't always the case. I could give numerous examples, but the following excerpt from my own comments posted this AM to the coalition of community groups I mentioned earlier may help capture some of the jaw-dropping issues we're contending with.
This refers to a Map Book of data visualizations -- I could call it "Data Visualization Gone Weird". I think someone got the idea that data visualization was the Big Thing and was the proper way to present data for decisionmaking, but the process went downhill from there.
The whole approach to studying alternative corridors and selecting one for Austin's urban rail system has sorta gone haywire, as my comments indicate (subject line was "Map Book fails"):
The Project Connect Map Book does contain some interesting data visualizations, but its reliability is questionable. Overall, the information displayed in the Map Book is so seriously flawed and deficient that I would not regard it as an adequate basis for decisionmaking in terms of selecting an urban rail corridor.
Here are just a few of the problems I find with the latest version (v. 5)...
• The segmentation of the huge central city study area (misnamed "Central Corridor") into segmented sectors (misnamed "sub-corridors") makes it impossible to evaluate actual TRAVEL CORRIDORS and thus potentially viable TRANSIT corridors. I've explained this in previous postings.
• Each of these sectors ("sub-corridors") is populated with dubious data, for most of which no validation is given (a large number of citations refer mainly to "Alliance Transportation Group", whose credibility has not been established).
• The representation of data via an attempt at data visualization is distorted and misleading in many key respects. For example, there are mismatches between the sizes of some data bubbles on maps and the numerical values they're supposed to convey. The position of data bubbles does not necessarily correspond with the location of the demographic or other features they are meant to convey. Traffic congestion is represented by coloring streets with red or deep red, but no effort is made to correlate this with predominant traffic flows and their volumes.
• The overlay of so-called "centers" (produced by CAMPO) is, put simply, worthless to misleading. Major centers are excluded (e.g., north Central Market area, Seton medical center area, Triangle), and the heart of the city — stretching from west of Loop 1 (MoPac) to east of Pleasant Valley Road — is represented as one enormous activity "center". This is meaningless — it tells us nothing about actual activity centers and their sizes or locations. Furthermore, this representation of "centers" encompasses vast areas of residential land use, which effectively contradicts the basic significance of an "activity center".
• The travel demand model visual representations (pp. 42-44), attributed to CAMPO, are labeled as projections for 2035. As AURA maintains in its AURA Subcorridor Choice Guide, "Data is Better Than Projections for Ridership Estimation". "We recommend the use of the real-world, historical data in the 2010 maps over the hypothetical projections in the 2030 maps when assessing routes' ridership potential. We believe use of the real-world, recently-observed data gives the more accurate and reliable picture of potential ridership, as well as the greatest viability for federal funding."
Projecting economic trends over the course of 17 years is very difficult. Projecting policy in an ever-changing political world is even more uncertain. While the 2010 data and the 2030 projections are both presented as cut-and-dried figures, only the 2010 data has the benefit of being a measurement of the real world. Project Connect's 2030 projections, on the other hand, represent a single-point estimate within an unknowably wide range of possibilities for the future. It is unwise to assume that city policy will remain both stable and powerful enough to match actual growth over the next 17 years to 2012 targets amidst wholesale changes to city political structures.
Yet these 2035 projections are the ONLY actual travel demand vectors displayed — the ONLY visually rendered data in the entire Map Book that purport to represent actual TRAVEL PATTERNS!
Basically, the Map Book provides a generally interesting but disassociated inventory of unvalidated data for assorted, segmented sectors of the central city. There is nothing that correlates with the travel-demand data of potential urban rail corridors under consideration, and in particular the key consideration of TRAVEL DENSITY.
So again, I would reject this in its current form as anything approaching an adequate data basis for decisionmaking to determine an initial urban rail corridor.
- 10/30/2013 12:35:46 PM
I'll add a phrase I saw from Dan Bricklin, a well known technologist who founded Visicalc, the first spreadsheet - the use of data scirentist and a business analyst is an "unfortunate collision of terminology"
- 10/30/2013 11:53:42 AM
I like how you summed the difference. Sometimes analysis requires a bit of research, but it is mostly gathering facts, instead of establishing a metric or an idea data range.
- by Nnanci, Blogger
- 10/30/2013 11:38:20 AM
In the strict sense of the word, if its mathematical its going to be considered a science. But the way i see it, business analyst work need not be titled with the word "science" since it isn't typically done with a view to making discoveries that can be generalized for all similar situations.
- 10/30/2013 10:51:38 AM
Didn't know that about Jimmy Carter, but with public figures, it figures that their background would be questioned a little with repsect to title and responsibility. People want leaders to match to who they are in any field.
- by kq4ym, Data Doctor
- 10/30/2013 10:40:15 AM
Thje job title argument reminds me of the chatter when former President Jimmy Carter said he was a nuclear engineer. Having worked aboard Navy submarines it seemed an elegant way to present oneself. But many disputede that he should be using the term engineer. So, we're still in some disagreement about just what terms to use. Business or data analyst? In some ways they do the same job, but in others' not so much.
- by BethSchultz, Blogger
- 10/30/2013 8:17:53 AM
And you're in Austin. Can you imagine how much more of a weapon the data becomes in bigger, more notably politically corrupt cities live here in Chicago where I live? On the upside, it sounds like you've got a great many challenges to tackle, which will keep you busy!
- by Lyndon_Henry, Blogger
- 10/29/2013 11:03:29 PM
It must be curious to have both sides using the same datasets to make their cases. Is this about the ability to manipulate data to show the results you'd like to show or is it more about picking which elements of a dataset best tell your story?
Beth, I don't want to dive too deeply (in this short comment) into this really complex issue, but I'll try to simplify and summarize as best I can. Basically this is a kind of debate between a major coalition of grassroots citizens' groups (G-L corridor coalition) and a consortium of official agencies (Project Connect, PJ).
Each group may be drawing on different data sets — the PJ side so far have not been forthcoming about sources of data. The G-L side has been drawing mainly on the US Census data sets I mentioned earlier.
Data "manipulation" is also involved — when applied to an urban area, where you can draw the boundaries of urban sectors in different ways, yielding different demographic profiles and different data.
It's messy. The summary above at least gives some idea of the issues involved. In any case, it's a situation with data flying through the air like cruise missiles. And then you get into the politics...
For more info on this exciting drama, visit:
- 10/29/2013 10:52:55 PM
Seth, I agree about the title highlight. The titles data scientist and business analyst sometimes are used so interchangeably that it can diminish what to expect from their effort. Part of the value from analytics and business intelligence in general is the expectation of roles. And Beith is right that curiousity should be at the core of capability.
- by Phoenix, Data Doctor
- 10/29/2013 12:42:40 PM
The data scientist also needs to know what type of software should be used for analysis. The ability to organise the data in a way that it can be analysed properly is also important. A data scientist should also know the right tool to use in a given situation.