Don't Conflate or Confuse Database Consistency with Truth


(Image: Semisatch/Shutterstock)

(Image: Semisatch/Shutterstock)

Disregard for foundation knowledge and failure to learn from past mistakes by even data professionals deemed experts inhibit progress in data management and bring back problems already resolved that should be of foremost concern to data analysts. Consider the following:

"Above all else, we count on databases to reflect the truth consistently, or at least to reflect the table data perfectly. The database cannot be blamed when an application (or the end users of an application) place inaccurate data in its tables, but a database must accurately report the data it holds. Therefore, bugs are not all created equal; there are bugs, and there are wrong-rows bugs, bugs that silently misrepresent the data that the tables hold. Even the craziest, most obscure corner case that potentially misrepresents your data should rightly bring a loud chorus: "The emperor has no clothes!" We depend on the database, above all, not to lie."

(Note: The common misuse of 'database' to refer interchangeably to both the data and the DBMS -- the software managing the data -- induces confusion (e.g., a database 'is' the data, so the database always reflects itself perfectly -- a tautology). What we are talking about here is 'data integrity' and 'database consistency' -- different issues altogether.)

In the database context both 'truth' and 'consistency' are critical, but they are distinct and should not be confused, or conflated. We can and should count on DBMSs to guarantee database consistency with the conceptual model of the real world it represents -- "wrong tuples" are not bugs, but functional failure. On the other hand, a DBMS cannot (and should not be expected to) ensure truth.

A 'mathematical' relation is an abstraction -- tuples have no meaning. 'Database' relations, on the other hand, are assigned meaning by database designers: they represent facts about members of real world object groups. For example, the tuples in the relation

EMPLOYEES {EMP#,ENAME,DEPT#,SALARY}

represent facts about employees of the type:

Employee identified by employee number (EMP#) has name (ENAME), works in department with department number (DEPT#), earns salary (SALARY).

Substituting values for the parenthesized terms yields facts about individual employees. But the values are not arbitrary, relations must be consistent with the conceptual model of reality it represents -- i.e., the business rules that specify the defining property values of the corresponding object groups. The rules are formally represented in the database by integrity constraints, enforced by the DBMS, that constrain relations to be consistent with the rules -- i.e., have only valid attribute values.

Note very carefully, however, that a tuple can satisfy the constraints -- i.e., consist of valid values -- yet represent a fact that is not true in the real world -- i.e., not about an actual employee! Hence the difference between consistency and truth.

When the database is queried, a true RDBMS applies the set operations of the 'relational algebra' (RA) to one or more database relations to derive other relations as results. Database tuples represent facts that are 'axioms' and result tuples facts that are 'theorems' in formal logic. 'If -- and only if' -- the axioms are true, the theorems are 'provably' true -- i.e., they are logical implications of the theorems. Querying is, thus, 'inferencing': RA queries are, in effect, logic 'proofs'; a true RDBMS together with a fully normalized relational database form a deductive system and query results are 'system-guaranteed' to be logically correct.

Ideally only true axioms should be represented in the database. But while a DBMS can enforce declared constraints for database consistency, it cannot ascertain truth (e.g., that there is an actual employee with specific property values in the real world). The facts recorded in the database are, thus, not statements of objective truth about the world -- they are 'assumed to be true only because they were asserted by trusted authorized users' and are as true as the trust accorded those users.

Note that even consistency is system-guaranteed 'if and only if' both the DBMS and databases are truly relational. The reality is they are not, so analysts: caveat emptor!

[Special Offer: Fabian Pascal is offering AllAnalytics readers a 25% discount on his latest book and two series of papers. Contact him by email via the About page at dbdebunk.com.]

Fabian Pascal, Founder, Editor & Publisher, Database Debunkings

Fabian Pascal is an independent writer, lecturer, and analyst specializing in database management, with emphasis on data fundamentals and the relational model. He was affiliated with Codd & Date and has taught and lectured at the business and academic levels. Clients include IBM, Census Bureau, CIA, Apple, UCSF, and IRS. He is founder, editor, and publisher of Database Debunkings, a Website dedicated to dispelling myths and misconceptions about database management; and the Practical Database Foundations series of papers. Pascal, author of three books, has contributed extensively to trade publications including DM Review, Database Programming and Design, DBMS, Byte, Infoworld, and Computerworld.

Structure, Integrity, Manipulation: How to Compare Data Models

Is that new data trend actually something that you really need, or you could risk being left behind? Or is it just a buzz word or a fad?

Data Meaning: Analytics vs. Data Mining

Can you reconstruct the meaning for a database when you have no documentation? Here's a deep dive.


Re: Finding the Truth
  • 9/6/2017 12:32:44 PM
NO RATINGS

It is possible to have a tuple that represents a fact about some object that satisfies the constraints, but nevertheless is not true in the real world. This could be due to an inadvertent mistake, or intentional.

 

 

Finding the Truth
  • 9/6/2017 10:09:15 AM
NO RATINGS

Interesting how such a simple concept can lead to some very embarassing if not costly mistakes in any company. The example cited of how placing a non-existent "employee" might from the outside look like the database is working fine but while the "DBMS can enforce declared constraints for database consistency, it cannot ascertain truth." A close examination of all will be a prudent process.

INFORMATION RESOURCES
ANALYTICS IN ACTION
CARTERTOONS
VIEW ALL +
QUICK POLL
VIEW ALL +