The Necessity of Foreign Keys

Last month I showed how poor grasp of data fundamentals makes it difficult to understand different types of key and the necessity of primary keys (PK). There's another type of key that is poorly understood for the same reason:

    "... [we] wish to make a point. There is something which is bad design/good design/mandatory/optional. Please stop insisting that Primary and Foreign keys are mandatory. They are good design habits but by no means mandatory."

Really? FKs were introduced as a relational solution to the problems of hierarchic database technology of the late 60s. Consider the actual example E.F. Codd used in 1970 -- a four-object hierarchy -- when he introduced the concept:

In the hierarchic databases of the time there would be four "segments" -- record types -- connected physically by pointers that would have to be navigated for data access. For example, SALARYHISTORY data would be accessible only via EMPLOYEE records and pointer navigation through the JOBHISTORY records. In other words, information about relationships between objects of different types was represented by the 'physical organization' of the data. Aside from the complexity, applications accessing data for analytical purposes had to reference them explicitly and any reorganization would impair applications, necessitating prohibitive maintenance.

Relational systems support 'physical independence' (PI) -- .e., applications are insulated from physical details. Data access is based exclusively on information content -- data values -- not machine internals that are unknown and irrelevant to application developers and users. That is why the core relational Information Principle (IP) mandates 'all' information in a relational database -- including about relationships -- be represented in 'only one way': as domain values in relations.

If a relation is substituted for each record type:

JOBHISTORY, SALARYHISTORY and CHILDREN are attributes defined on non-simple domains -- specifically, 'relation-valued domains' (RVD) -- i.e., that have relations as values. As I explained in Data Sublanguages, Programming, and Data Integrity, relations "nested" within relations -- sets of sets -- require logic higher than first-order predicate logic (FOPL), the formal foundation of RDM, which robs relational databases of their advantages and makes them not less complicated than the hierarchic predecessors they replaced.

Consequently, the information about the relationships between objects of different types should be represented by values rather than pointers, in compliance with the IP. If relations are defined only on 'simple domains' with atomic values -- are in what we call their 'normal form' -- FOPL and its advantages are preserved. The RVDs can be converted to relations "in their own right" and the pointers with FKs. A FK is a combination of one or more attributes of one relation that is not the PK, whose values match those of the PK of some other relation:


Thus, FKs are not just "a good database design habit", but "a user-oriented means" -- as Codd referred to them -- to represent information about relationships between objects of different types in relational databases. The DBMS enforces the matching via a 'referential constraint' on the relations representing the related object types. With relations in normal form and FKs, FOPL and the soundness and simplicity of relational databases are preserved.

Lately graph databases -- of which hierarchic databases are one type -- are reinvented as a superior alternative to relational databases. Data professionals young enough to have not been around the old versions to experience what FKs replaced should check them out (e.g., A few queries on IMS-DB) before they believe any arguments such as the one above. Those who forget the past ...

As to other constraints, stay tuned.

Fabian Pascal, Founder, Editor & Publisher, Database Debunkings

Fabian Pascal is an independent writer, lecturer, and analyst specializing in database management, with emphasis on data fundamentals and the relational model. He was affiliated with Codd & Date and has taught and lectured at the business and academic levels. Clients include IBM, Census Bureau, CIA, Apple, UCSF, and IRS. He is founder, editor, and publisher of Database Debunkings, a Website dedicated to dispelling myths and misconceptions about database management; and the Practical Database Foundations series of papers. Pascal, author of three books, has contributed extensively to trade publications including DM Review, Database Programming and Design, DBMS, Byte, Infoworld, and Computerworld.

The Importance of Understanding Classes, Sets, and Relations for Analytics

Failure to understand these fundamentals causes poor database designs and risks incorrect and/or improperly interpreted analytics results.

Understanding the Division of Labor between Analytics Applications and DBMS

Those who ignore data fundamentals will always risk costly mistakes and inhibit their own progress towards analytics goals. Here's why.

Re: keys vs. constraints
  • 6/12/2017 4:37:16 PM

Very lucky indeed. Don't leave the place.

Re: keys vs. constraints
  • 6/12/2017 4:09:46 PM

Well, I've been familiar with your advocacy for some years now -- it's appreciated.

I'm fortunate enough to work in a sort of old-school environment where no one questions that the designs for our OLTP applications should be based on solid RDBMS conventions. I feel lucky not to have to explain to upper management why we haven't converted our ASE implementations to some flavor of NoSQL by now.

Thanks for your responses.

Re: keys vs. constraints
  • 6/12/2017 3:16:58 PM

Incidentally a PK is a combination of one or more attributes that satisfies a PK constraint--uniqueness--in the relation--so "keys vs. constraints" is a rather misleading phrase.

Re: keys vs. constraints
  • 6/12/2017 3:13:23 PM

That type of statements in an industry characterized by a lack of and disregard for foundation knowledge reinforces confusion and misconceptions.

Constraints are not relevant for read-only databases, provided of course that the data originates in a database that DID enforce all the necessary constraints. That cannot be expected with current products and practices, but then applying constraints in the replicate stage is too late.


Re: keys vs. constraints
  • 6/12/2017 11:55:45 AM

Sorry for any confusion, but I was simply making a comment about the quotation you included and were responding to in your original post. I wasn't indicating agreement with the quotation and certainly understand referential integrity and its importance.

Occasionally the distinction between using a FK column as the basis for a join and actually defining the constraint comes up. For example, I have found that defining constraints in a read only replicate can degrade replication performance considerably, so typically I create such replicates without the FK constraints required in the primary. 

Re: keys vs. constraints
  • 6/1/2017 4:56:08 PM

That's simply wrong.

Constraints represent in the database the meaning of the reality that the database represents. Without the constraints you can enter arbitrary random data in the database and render it meaningless.

One of the core objectives of database management is to make constraint enforcement a database function and take it away from applications. If you don't know why that is crucial, may I suggest some education of data fundamentals.

If you don't have the DBMS enforce constraints you are (1) taking no advantage of database management (2) you are throwing out consistency and data integrity. Those who don't understand this should not use databases.




keys vs. constraints
  • 5/30/2017 3:39:21 PM

"Please stop insisting that Primary and Foreign keys are mandatory. They are good design habits but by no means mandatory."

I just took this to be a sloppy way of saying that defined constraints are not mandatory.

Re: Clarifiication
  • 5/29/2017 5:55:20 PM

I dk what you mean by "direct".

A PK is a relation attribute that represents in the database an identifier of an object, a fact about which is represented in the database by a tuple that is displayed as a R-table row. Since objects are unique in the real world, a uniqueness constraint on the PK attribute ensures that its values are unique.

A FK is a relation attribute that represents in the database a relationship between objects of distinct types. Both PK and FK must be defined on the same domain i.e., they must represent the same object property (mean the same thing). Since the FK values must be consistent with the PK values, the two relations are subject to a referential constraint enforcing matching between the two sets of values.

I prefer to say that a PK represents identification and the FK a cross-reference.

  • 5/29/2017 5:29:56 PM

Am I correct in understanding that primary keys are a direct reference while foriegn keys are more of a cross reference?