Data Sublanguages, Programming, and Data Integrity


Both data science employers and candidates stress the eclectic nature of the required skills, programming in particular. Indeed, coding has acquired such an elevated role, that it now entirely replaces education. Aside from the societal destructive consequences of this trend, in the context of data management it is a regressive self-fulfilling prophecy that obscures and disregards the core practical objective of database management to minimize programming. You can frequently encounter it in comments like:
    "Anything you can model in a DBMS you can model in Java. The next paradigm shift is business rules centralized in Java business objects, rather than hard-coded in SQL for better manageability, scalability, etc. The only ones that should reside in a database are referential integrity (and sometimes even that isn't really necessary). Don't let pushy DBAs tell you otherwise -- integrity constraints slow down development as well as performance."

Upside down and backwards.

First, programming languages such as Java are computationally complete and as such are based on higher logic than first order predicate logic (FOPL). While expressively more powerful, they allow self-referencing and, therefore:

  • Are not decidable i.e., are susceptible to the halt problem
  • Cannot support data independence -- the insulation of applications and users from physical and logical data reorganizations
  • Are imperative (procedural)
  • Are more complex than the relational algebra

What, unfortunately, remains misunderstood to this day, is that the relational data model (RDM) was introduced to address these drawbacks. FOPL-based relational data sublanguages are decidable, support data independence, declarative (non-procedural) and are much simpler. Computational completeness for application development is achieved by hosting them in programming languages.

(Image: nmedia/Shutterstock)

(Image: nmedia/Shutterstock)

Second, many years ago I wrote an article titled “Integrity is Not Only Referential” with a double-entendre: I was criticizing a DBMS vendor that was claiming, misleadingly, that its product’s application-enforced integrity was actually relational DBMS-enforced integrity and also deploring data professionals’ poor grasp of this important distinction that was letting vendors get away with it. Sadly, nothing much has changed since then.

Out of the several types of relational integrity constraints -- Domain; Attribute; Tuple; Multi-tuple; and Database (multi-relation) -- data professionals are superficially familiar with only one type of multi-tuple constraint -- key constraint -- and one type of database constraint -- foreign key (referential) constraint. Not only don’t they demand DBMS support of all the integrity constraints because they are unaware of them, they often see even the enforcement of even the two that they are aware of as unnecessary, without realizing the deleterious implications.

"Constraints out of data access code" is a regress to the pre-database days of application-enforced integrity and is the opposite of improving manageability and productivity. Aside from preventing subversion of constraints that are conducive to database inconsistency -- which should be every analyst’s concern -- DBMS-enforced integrity would actually improve performance by allowing the DBMS to integrate constraints into its optimization strategy, something which is difficult to do if the constraints are enforced with triggered procedures and impossible if they are scattered in application code.

It is argued that "The analyst shouldn't be worrying about which [DBMS] solution is being implemented, that is the data engineer's job, but rather what business value or insights can be extracted from the data." Well, those insights depend on the integrity of the data in databases, which in turn depends on how reliably it is enforced. Unfortunately, SQL DBMSs do a very poor job of it and NoSQL ones practically none, which should put the analyst on guard.

This is an excerpt from The DBDebunk Guide to Misconceptions About Data Fundamentals, available at dbdebunk.com.

Fabian Pascal, Founder, Editor & Publisher, Database Debunkings

Fabian Pascal is an independent writer, lecturer, and analyst specializing in database management, with emphasis on data fundamentals and the relational model. He was affiliated with Codd & Date and has taught and lectured at the business and academic levels. Clients include IBM, Census Bureau, CIA, Apple, UCSF, and IRS. He is founder, editor, and publisher of Database Debunkings, a Website dedicated to dispelling myths and misconceptions about database management; and the Practical Database Foundations series of papers. Pascal, author of three books, has contributed extensively to trade publications including DM Review, Database Programming and Design, DBMS, Byte, Infoworld, and Computerworld.

Understanding the Division of Labor between Analytics Applications and DBMS

Those who ignore data fundamentals will always risk costly mistakes and inhibit their own progress towards analytics goals. Here's why.

Don't Conflate or Confuse Database Consistency with Truth

In the database context both truth and consistency are critical, but they should not be confused or conflated. DBMSs guarantee database consistency with the conceptual model of the real world they represent. On the other hand, a DBMS cannot and should not be expected to ensure truth.


Re: Coding Class
  • 1/7/2017 4:41:53 PM
NO RATINGS

It does seem that educational trends come and go, and the emphasis on the "one best" way of doing things doesn't always stand the test of time. Coding and the exclusive emphasis may be one of the subjects that's going to see some modification of philosophy and practicality most likely.

2 programming issues
  • 1/2/2017 6:35:59 PM
NO RATINGS

We should distinguish between programming in general and in the database context.

While they are related, they are distinct. The former, programming life itself is what the corporate world wants--the brave new world of AI, machine learning, algorithmifying everything.

My post is about the latter. Relational databases are more specific about less programming in database management, for reasons that it explains.

Re: Coding Class
  • 1/1/2017 7:56:19 PM
NO RATINGS

Michelle,

There is more than ample evidence that there are NO good intentions whatsoever about substituting coding for education. It is socially destructive and it is intended for control, manipulation and exploitation. This is clear from what the tech companies are doing.

 

Re: Coding Class
  • 1/1/2017 7:53:16 PM
NO RATINGS

As I put it in my post.

There is a huge danger in having every kid programmed to code, but without an education--intellectual development, maturity, life experience. This is how regimentation and tyranny starts.

 

All of this happens in part because the coding is done by young immature dropouts that are morally and empathy challenged, some of which become zillionaires and step on cadavers.

Re: Decidable?.
  • 1/1/2017 7:43:34 PM
NO RATINGS

A language that allows self-referencing is prone to paradoxes which means that truth cannot always be decided--in programming terms, it suffers from the "halt problem". Neither can it support physical independence -- the insulation of applications and queries from organization and reorganization of storage.

This is true of all computationally complete programming languages because they are based on higher logic than first order logic that allow self-referencing. A relational data sublanguage is based strictly of first order logic and is hosted in programming languages for anything other than data operations e.g., application development.

Re: Coding Class
  • 12/30/2016 2:14:18 PM
NO RATINGS

@louisw900, I couldn't agree more! Too narrow focus is a long-term detriment in my estimation.

Re: Coding Class
  • 12/29/2016 2:22:26 PM
NO RATINGS

@Michelle   I agree.  Coding has it's usefulness, but Coding alone will not overcome the divide that is present when it comes to children improving in areas of STEM.

Re: Coding Class
  • 12/29/2016 1:40:50 PM
NO RATINGS

PC, the push towards code seem a bit aggressive because it is presented in replacement not in addition. I've felt to often that we've become too much of an either or society. Specialization is efficient but limiting.

Re: Coding Class
  • 12/29/2016 1:25:56 PM
NO RATINGS

@PC I could get behind problem solving tasks that involve code, but I don't think learning code alone is best. I think the movement to push code into education is well-intended, but needs work. 

Coding Class
  • 12/28/2016 3:12:54 PM
NO RATINGS

@Fabian

You always help deepen my understanding of the importance of sticking to fundementals.

There is a trend in some circles that "everyone should learn to code". Meaning that everyone in K-12 should have a programming class. Although I enjoy code, this has always seemed to me a poor way to educate everyone. With this post, i think I better understand why I feel this way.

Page 1 / 2   >   >>
INFORMATION RESOURCES
ANALYTICS IN ACTION
CARTERTOONS
VIEW ALL +
QUICK POLL
VIEW ALL +