The Necessity of Foreign Keys

Last month I showed how poor grasp of data fundamentals makes it difficult to understand different types of key and the necessity of primary keys (PK). There's another type of key that is poorly understood for the same reason:

    "... [we] wish to make a point. There is something which is bad design/good design/mandatory/optional. Please stop insisting that Primary and Foreign keys are mandatory. They are good design habits but by no means mandatory."

Really? FKs were introduced as a relational solution to the problems of hierarchic database technology of the late 60s. Consider the actual example E.F. Codd used in 1970 -- a four-object hierarchy -- when he introduced the concept:

In the hierarchic databases of the time there would be four "segments" -- record types -- connected physically by pointers that would have to be navigated for data access. For example, SALARYHISTORY data would be accessible only via EMPLOYEE records and pointer navigation through the JOBHISTORY records. In other words, information about relationships between objects of different types was represented by the 'physical organization' of the data. Aside from the complexity, applications accessing data for analytical purposes had to reference them explicitly and any reorganization would impair applications, necessitating prohibitive maintenance.

Relational systems support 'physical independence' (PI) -- .e., applications are insulated from physical details. Data access is based exclusively on information content -- data values -- not machine internals that are unknown and irrelevant to application developers and users. That is why the core relational Information Principle (IP) mandates 'all' information in a relational database -- including about relationships -- be represented in 'only one way': as domain values in relations.

If a relation is substituted for each record type:

JOBHISTORY, SALARYHISTORY and CHILDREN are attributes defined on non-simple domains -- specifically, 'relation-valued domains' (RVD) -- i.e., that have relations as values. As I explained in Data Sublanguages, Programming, and Data Integrity, relations "nested" within relations -- sets of sets -- require logic higher than first-order predicate logic (FOPL), the formal foundation of RDM, which robs relational databases of their advantages and makes them not less complicated than the hierarchic predecessors they replaced.

Consequently, the information about the relationships between objects of different types should be represented by values rather than pointers, in compliance with the IP. If relations are defined only on 'simple domains' with atomic values -- are in what we call their 'normal form' -- FOPL and its advantages are preserved. The RVDs can be converted to relations "in their own right" and the pointers with FKs. A FK is a combination of one or more attributes of one relation that is not the PK, whose values match those of the PK of some other relation:


Thus, FKs are not just "a good database design habit", but "a user-oriented means" -- as Codd referred to them -- to represent information about relationships between objects of different types in relational databases. The DBMS enforces the matching via a 'referential constraint' on the relations representing the related object types. With relations in normal form and FKs, FOPL and the soundness and simplicity of relational databases are preserved.

Lately graph databases -- of which hierarchic databases are one type -- are reinvented as a superior alternative to relational databases. Data professionals young enough to have not been around the old versions to experience what FKs replaced should check them out (e.g., A few queries on IMS-DB) before they believe any arguments such as the one above. Those who forget the past ...

As to other constraints, stay tuned.

Fabian Pascal, Founder, Editor & Publisher, Database Debunkings

Fabian Pascal is an independent writer, lecturer, and analyst specializing in database management, with emphasis on data fundamentals and the relational model. He was affiliated with Codd & Date and has taught and lectured at the business and academic levels. Clients include IBM, Census Bureau, CIA, Apple, UCSF, and IRS. He is founder, editor, and publisher of Database Debunkings, a Website dedicated to dispelling myths and misconceptions about database management; and the Practical Database Foundations series of papers. Pascal, author of three books, has contributed extensively to trade publications including DM Review, Database Programming and Design, DBMS, Byte, Infoworld, and Computerworld.

Why You Always Need Primary Keys

Database pros should heed this warning: if you ignore that primary keys are mandatory, you can wreak havoc with inferences made from databases, including in analytics.

The Trouble with Data Warehouse Analytics

Data warehouses are essentially databases biased for particular data applications and against others. They are rooted in poor database foundation knowledge and logical-physical confusion.