Every few years -- the interval is getting shorter -- data management is claimed to undergo a “paradigm shift,” some new fundamentally different new way of doing things is being promoted (e.g., “Consider dimensional design and Big Data as two additional paradigms”) that if you don't adopt you’ll be left behind (I’ve written about Big Data, for data warehouse dimensional modeling see Data Warehouses and the Logical-Physical Confusion). For the few who understand what a paradigm is and are familiar with the history of data management, the irony could not be richer.
Paradigm shifts are associated with Thomas Kuhn’s account of scientific progress. A paradigm is an exemplar of a broad-scope theory that practitioners in a particular field admire and emulate. A field in a pre-paradigmatic state is characterized by disunity of purpose and method. A paradigm characterizes normal science: agreement on what research should be done and how. When anomalies -- problems that the theory cannot account for -- accumulate, it loses confidence and the field undergoes a crisis. A new theory emerges as paradigm that accounts for what the old one did as well as the anomalies and normal science returns until the next crisis (see THE MEANING OF SCIENCE). A well known paradigm is Darwin’s theory of evolution and so is the shift from the Newtonian to the Einsteinian paradigm.
Before the 1970s, data management was in a pre-paradigmatic state: There were only application programs and application-specific data files, the infamous redundant "islands of information". The importance of functions such as integrity, security, concurrency control, physical management, performance optimization had yet to be realized and, when it was, they were undertaken redundantly and unreliably by each and every application program, a prohibitive and costly burden that was mostly avoided altogether. This was unsustainable; inevitably, problems accumulated that produced a crisis.
What emerged was the database concept:
- Application-specific data files were replaced by "neutral" databases to eliminate redundancy and serve diverse applications concurrently.
- Some database functions were centralized in DBMSs, reducing development and maintenance burden, increasing reliability, and relieving applications to their tasks (e.g., analytics).
But the first generation hierarchic and network databases and DBMSs missed a critical ingredient of a paradigm. They were abstracted in an ad hoc manner from existing practice, diverse and proprietary, rather than based on a theoretical foundation. Attempts to fit one -- directed graph theory -- post-hoc failed and were abandoned, because it proved prohibitively complex and inflexible in practice, which triggered another crisis.
The relational data model (RDM), with its dual formal foundation of first order predicate logic (FOPL) and simple set theory, put data management on a sound scientific basis. It is not by pure coincidence that the first attempt to implement the RDM -- SQL -- became the de-facto lingua franca of database management and an ANSI/ISO standard. But while even the poor relational fidelity of SQL was sufficient to render it superior to the technologies that preceded it (and it isn’t devoid of ability to support analytics applications either), SQL DBMSs fail to confer most of the core practical benefits of the RDM, first and foremost among them guaranteed logical and semantic correctness.
Unfortunately, due to lack of proper education and poor foundation knowledge, today’s data professionals in both user and vendor organizations are not familiar with the history of the field, deem SQL DBMSs relational, and do not know and appreciate the practical benefits that are missing (for examples see DatabaseDebunkings), which prevents the RDM for serving as the paradigm of normal data management and scientific progress that it inherently is. So they do not realize that most of the so-claimed new “paradigms” are essentially regressions to the old, pre-paradigmatic state of ad hoc, diverse, proprietary, application-specific technologies that are inferior to even SQL -- the very opposite of Kunian scientific progress.
Old graph DBMSs suffixed their names with /R. The “new” ones promise to deliver NoSQL capabilities, but suffix their names with QL. Some paradigm shift.