Brother, Spare Me the Paradigm

Every few years -- the interval is getting shorter -- data management is claimed to undergo a “paradigm shift,” some new fundamentally different new way of doing things is being promoted (e.g., “Consider dimensional design and Big Data as two additional paradigms”) that if you don't adopt you’ll be left behind (I’ve written about Big Data, for data warehouse dimensional modeling see Data Warehouses and the Logical-Physical Confusion). For the few who understand what a paradigm is and are familiar with the history of data management, the irony could not be richer.

Paradigm shifts are associated with Thomas Kuhn’s account of scientific progress. A paradigm is an exemplar of a broad-scope theory that practitioners in a particular field admire and emulate. A field in a pre-paradigmatic state is characterized by disunity of purpose and method. A paradigm characterizes normal science: agreement on what research should be done and how. When anomalies -- problems that the theory cannot account for -- accumulate, it loses confidence and the field undergoes a crisis. A new theory emerges as paradigm that accounts for what the old one did as well as the anomalies and normal science returns until the next crisis (see THE MEANING OF SCIENCE). A well known paradigm is Darwin’s theory of evolution and so is the shift from the Newtonian to the Einsteinian paradigm.

Before the 1970s, data management was in a pre-paradigmatic state: There were only application programs and application-specific data files, the infamous redundant "islands of information". The importance of functions such as integrity, security, concurrency control, physical management, performance optimization had yet to be realized and, when it was, they were undertaken redundantly and unreliably by each and every application program, a prohibitive and costly burden that was mostly avoided altogether. This was unsustainable; inevitably, problems accumulated that produced a crisis.

What emerged was the database concept:

  • Application-specific data files were replaced by "neutral" databases to eliminate redundancy and serve diverse applications concurrently.
  • Some database functions were centralized in DBMSs, reducing development and maintenance burden, increasing reliability, and relieving applications to their tasks (e.g., analytics).

But the first generation hierarchic and network databases and DBMSs missed a critical ingredient of a paradigm. They were abstracted in an ad hoc manner from existing practice, diverse and proprietary, rather than based on a theoretical foundation. Attempts to fit one -- directed graph theory -- post-hoc failed and were abandoned, because it proved prohibitively complex and inflexible in practice, which triggered another crisis.

The relational data model (RDM), with its dual formal foundation of first order predicate logic (FOPL) and simple set theory, put data management on a sound scientific basis. It is not by pure coincidence that the first attempt to implement the RDM -- SQL -- became the de-facto lingua franca of database management and an ANSI/ISO standard. But while even the poor relational fidelity of SQL was sufficient to render it superior to the technologies that preceded it (and it isn’t devoid of ability to support analytics applications either), SQL DBMSs fail to confer most of the core practical benefits of the RDM, first and foremost among them guaranteed logical and semantic correctness.

Unfortunately, due to lack of proper education and poor foundation knowledge, today’s data professionals in both user and vendor organizations are not familiar with the history of the field, deem SQL DBMSs relational, and do not know and appreciate the practical benefits that are missing (for examples see DatabaseDebunkings), which prevents the RDM for serving as the paradigm of normal data management and scientific progress that it inherently is. So they do not realize that most of the so-claimed new “paradigms” are essentially regressions to the old, pre-paradigmatic state of ad hoc, diverse, proprietary, application-specific technologies that are inferior to even SQL -- the very opposite of Kunian scientific progress.

Old graph DBMSs suffixed their names with /R. The “new” ones promise to deliver NoSQL capabilities, but suffix their names with QL. Some paradigm shift.

Fabian Pascal, Founder, Editor & Publisher, Database Debunkings

Fabian Pascal is an independent writer, lecturer, and analyst specializing in database management, with emphasis on data fundamentals and the relational model. He was affiliated with Codd & Date and has taught and lectured at the business and academic levels. Clients include IBM, Census Bureau, CIA, Apple, UCSF, and IRS. He is founder, editor, and publisher of Database Debunkings, a Website dedicated to dispelling myths and misconceptions about database management; and the Practical Database Foundations series of papers. Pascal, author of three books, has contributed extensively to trade publications including DM Review, Database Programming and Design, DBMS, Byte, Infoworld, and Computerworld.

The Importance of Understanding Classes, Sets, and Relations for Analytics

Failure to understand these fundamentals causes poor database designs and risks incorrect and/or improperly interpreted analytics results.

Understanding the Division of Labor between Analytics Applications and DBMS

Those who ignore data fundamentals will always risk costly mistakes and inhibit their own progress towards analytics goals. Here's why.

Re: Ironic
  • 9/29/2016 11:19:50 PM

Agreed. Some of the opinions I hear against Science are simply mind boggling.

Re: Ironic
  • 9/29/2016 6:58:37 PM

This is a systemic and cultural problem.

If a society does not reward and, in fact, punishes science (disregard disincents it), there is no reason to expect it.

Re: Ironic
  • 9/29/2016 6:41:24 PM

Yes, science requires a couple of actors before it can be successful -

  1. Someone capable and willing to perform science
  2. Someone willing to listen to and act on the results

In my experience, a lack of the latter is at least as common as a lack of the former. But my sample may be biased.

Re: Ironic
  • 9/29/2016 2:16:44 AM

Not if nobody is capable and willing to apply it.

  • 9/29/2016 12:02:35 AM

Ignorance, stupidity and poor training are powerful foes.

Fortunately, rational science can offer real results.