NoSQL, Big Data Analytics, and the Loss of Knowledge and Reason


The data management industry operates like the fashion industry. Its most persistent characteristic is migration from fad to fad. Every few years -- the number keeps getting smaller -- some "new" problem is discovered, for which the solution is so magical, that it is extended everywhere to everything, whether it is applicable or not.

But many of these problems are old and fundamental and some of the “solutions” bring them back, rather than solve them.

The current solution is big data analytics, seen as the technology for solving and the problem for everything from terrorism to tuberculosis, and the problem is data complexity:

    "... the key challenge is not data size but complexity ... To make a Big Data initiative succeed, the trick is to handle widely varied types of data, disparate sources, datasets that aren’t easily linkable, dirty data, and unstructured or semi-structured data ... But … You don’t get a Big Data club card just for changing your old (but still trustworthy) data warehouse into a data lake (or even worse, a data swamp).” -- Big Data: The Key is Bridging Disparate Data Sources

Quite. Except that old geezers, er, experienced professionals like me remember complexity as the "islands of information" of the days of proliferating redundant application-specific files and application programs that would not talk to each other. The marketeers of "integrative solutions" hyped them then just as those of big data analytics do it today. Complexity is still with us because we mindlessly generate complexity much easier and faster than any "agile" magic wand can extract reliable useful information from it. But instead of addressing this fundamental problem, we accelerate it.

    “The NoSQL flavor of databases has come en vogue in the last few years in certain technology sectors, primarily ones that are evolving so quickly that having to slow down to put forethought into your data store and how it's going to be structured might literally be the difference between your whole company suceeding or not.” --ignoredbydinosaurs.com

Forethought has become an impediment, rather than a success factor -- one reason I find the "science" in data science, as practiced, highly questionable. When real science education still existed, it drilled into me that science is all about knowledge, reasoning, and forethought. This is exactly what we're now trying to avoid at all cost, because, as vocational training is substituted for education and you gotta be a dropout to succeed, thought -- not just forethought -- becomes increasingly difficult and even discouraged.

That explains the attractiveness of NoSQL and big data analytics. i.e., forward to the past:

    “There are no [design] rules of normalization for [NoSQL] databases … Which means you're designing the data organization to serve specific queries. So follow the same principle in NoSQL databases as you would for denormalizing a relational database: design your queries first, then the structure of the database is derived from the queries.” --What is a good way to design a NoSQL database?
Using the term "design" in this context is, of course, misleading. Design requires forethought, anticipating the types of likely queries and models reality and pre-structures data so as to simplify integrity enforcement and data manipulation. NoSQL and big data analytics do the opposite.
    “In various organizations, data modeling for NoSQL emphasizes the roles of [application] developers while deemphasizing the roles of data modelers and database administrators.” --Donovan Hsieh, eBay’s Senior Data Architect

In other words, we mindlessly pile up complexity and put our trust in minimally educated coders and machines “more intelligent than us” to tackle it for our benefit. The consequences are not only not different than those of application-managed data -- struggle to optimize for multiple uses data structured for specific uses -- and post-facto structuring anyway. They also are scary: the constant atrophiation of human intellect coupled with machines programmed by corporate interests to extract, not produce and distribute wealth is socially destructive.

Shouldn't we strive to avoid complexity via forethought, rather than keep increasing it? That's what database management -- and relational database management in particular -- were devised to address, but are being dismissed and ignored, because they require scarce intellectual abilities. They are yesterday's fads.

Fabian Pascal, Founder, Editor & Publisher, Database Debunkings

Fabian Pascal is an independent writer, lecturer, and analyst specializing in database management, with emphasis on data fundamentals and the relational model. He was affiliated with Codd & Date and has taught and lectured at the business and academic levels. Clients include IBM, Census Bureau, CIA, Apple, UCSF, and IRS. He is founder, editor, and publisher of Database Debunkings, a Website dedicated to dispelling myths and misconceptions about database management; and the Practical Database Foundations series of papers. Pascal, author of three books, has contributed extensively to trade publications including DM Review, Database Programming and Design, DBMS, Byte, Infoworld, and Computerworld.

Why You Always Need Primary Keys

Database pros should heed this warning: if you ignore that primary keys are mandatory, you can wreak havoc with inferences made from databases, including in analytics.

The Trouble with Data Warehouse Analytics

Data warehouses are essentially databases biased for particular data applications and against others. They are rooted in poor database foundation knowledge and logical-physical confusion.


Re: You're sort of right
  • 11/9/2016 5:17:25 AM
NO RATINGS

I did not receive at the time of your comment notification of it so I was not aware of it and I just came across it late because I'm revisiting this post for a next post in response to a comment on it on Hacker News.

Jim's comment was accurate regarding the meaning of my post. However, I will also add:

1. I stand by my claim of a TREND to substitute training for education. The evidence for it is overwhelming. In order to be a scientist one must have a scientific education. I would argue that in the last 2-3 years a multitude of "data science" entrants did not have it. This does not mean that there aren't exceptions, but it's important not to let them distract from the rule.

2. Your comment about "geezers" is pretty revealing. OTOH ageism in the tech industry is rampant--once 45-50 age is reached, it becomes practically impossible to be hired--while OTOH job ads have little education requirements, but demand years of experience. In fact, what they want is immature youth with little education but years of experience--a rather glaring contradiction. Dismissing the experience that comes with age, particularly when it comes to work that requires scientific knowledge, is a costly mistake. Technology and innovation are fine, but if they are not soundly grounded they end up in all those failing startups and can bring a society down in the long run. We can see signs of this already.

 

 

Re: You're sort of right
  • 7/22/2016 8:36:34 AM
NO RATINGS

@DavosCollective. Thanks for the thoughtful post. Welcome to the conversation. I think one of the issues at hand, and what may have inspired Fabian's blog, is that the term data scientist is being applied pretty freely. I'm sure there are people with the title out there who fit the general definition of scientist. However, there also are people who have been analysts their whole lives and they adopt the data science title because it's the hot thing. Plus, their employers, some recruiters, and even some educators are getting pretty casual with how they use the title data scientist. It's the type of thing that happens when any business/tech concept becomes popular. (Just consider the ongoing discussion here on the site about "real time". All of us might know it's important but the individual definitions range from sub-second to 24 hours).

I suspect that all of us have occasionally used the titles analyst and data scientist kind of casually and interchangeably. I think that's human nature, not a slap at white-coated, highly educated scientists.

 

Re: You're sort of right
  • 7/22/2016 8:06:05 AM
NO RATINGS

It's true there's been a lot of new vocationally focussed entrants to the tech education market, companies like code school, treehouse and code academy. The MOOC explosion of courses from like likes of coursera, edx and udemy is also a new phenomenon.

Vocational education is not a new thing though, and it's not yet substituted thoery focussed education, and likely never will, in spite of the fears of geezers senior industry people who think that younguns are getting out of doing the hard yards, or that their years of schooling and toil are being undermined by these mythical millenials. This myth that millenials have overinflated opinions of themselves and expect their first role to be CEO is just so tiresome, and thoroughly untrue. I know, I work with plenty of them and I can tell you they are as high achieving, studious and hard working as any other generation. The same sledges are hurled at every youth generation. The trope is boring, let it go.

It's plain curmudgeonly to talk about the "the constant atrophiation of human intellect". There's always been stupid people and smart people, what's changed? The base level of knowledge has increased, the number of subject areas to study has increased, literacy and numeracy levels have increased. All this in spite of soap operas and reality television and social media playing havok with entropy. 

Back to your comments about vocational training. To learn the art and science of programming, you need both computer science foundations (theory) and you need to learn language syntax and toolsets (vocational). Take the sponsor of this forum as example. SAS is a proprietary set of software pacakges that require training to gain expertise. If I go for a job requiring SAS then it certainly helps if I have a stats or maths thoeretical background, but what if that was taught in the context of matlab or R or something not SAS? Some of it is transferable yes, but tools have idiosyncrasies you can't learn without training or experience. Of course, having all three of theory and training and experience is what leads to expertise.

I work at a leading quant company and we certainly don't hire people who've done a 2 week bootcamp and have no degree. The graduate intake program has a quantitative degree as a prerequisite (actuarial, physics, comp sci etc). Vocational courses and certifications are great, but only as an adjunct to education and/or experience. When looking at candidates, the most valuable qualification is experience, then degree, then certification / vocational courses, in that order. If you have all of the above then even better. If you're a grad, well, you have no experience so the degree is most valuable at that point in your career. Professional association membership, well that's great too.

I completely agree that forethought is important, and generally with most of your points, but have to take issue with this statement:

"When real science education still existed, it drilled into me that science is all about knowledge, reasoning, and forethought"

Firstly, real science education still exists, and that statement is exactly what I mean by curmudgeonly. It just detracts from the other excellent points you make. 

Secondly, that definition of science doesn't do it justice. Contemporaneous knowledge is limited and changes when new information becomes available. In short, it's fallable. That's the strength of science, that in the long run it marches forward. Sure, it's not true in the short term because people are stubborn, but in the end the prevailing theory changes. There's very few laws in science; it's mostly thoeries, conjectures and hypotheses. The results of which is knowledge that is "true" not in an absolute sense but to the best of our current understanding. Having knowledge is useless by itself; it's important to have but is powerful when coupled with the ability to integrate new knowledge, to change your mind.

Science based on reasoning in its various flavours is important, yes; it was the entirity of the scientific method pre-renaissance, but philiosophers of science since then have acknowledged the flaws in reasoning. It's possible and quite common to have an entirely logical premises yet arrive at a fallacy. Reasoning is a good starting point, but is often disproved by rigorous application of the scientific method, i.e. the results of well designed, robust and repeatable experiments. 

The fourth paradigm of scientific discovery: data ( ref Tony Hey) takes the scientific method a step further. Perhaps machine learning over impossibly large (for a human) datasets can derive useful factors that we couldn't possibly arrive at with foresight alone. Of course we then have the arduous task of proving and validating those factors, but the point stands: foresight is not everything. Too much focus on foresight leads to waterfall approaches, where an iterative continuous improvement approach leads to better outcomes. Foresight alone as a methodology would only work with perfect and complete knowledge of everything including the future.

In summary, I take issue with your conclusion that "vocational training is substituted for education" and your statement about "the constant atrophiation of human intellect" and your allusion that "real science education" was exclusive to your generation and by extension, you.  It takes away from the good points you make about NoSQL not solving any new problems, and new fads being applied as magic bullets for every problem. I regret that this forum doesn't allow me to post links to references to support my refutations.

Thanks for this post, it gave me the opportunity to write this reply and along the way re-read some interesting things about the philosphy of science, epistemology and logic.

You're sort of right
  • 7/8/2016 2:17:11 PM
NO RATINGS

I have to admit that the education surrounding IT profesionals seems to be falling off. It's becoming more of a vocational training path then one of true education. And that's scary. 

INFORMATION RESOURCES
ANALYTICS IN ACTION
CARTERTOONS
VIEW ALL +
QUICK POLL
VIEW ALL +