Relational databases consist of relations -- sets of facts about property-sharing entities (relations are pictured as R-tables; R-table, row and column are shorthands for the underlying relation, tuple, and attribute). Every query executes an operation of the relational algebra that derives a new R-table from database R-tables. Its rows represent facts that are logical implications of the database facts. A RDBMS is, in fact, a logical inference engine that guarantees logical correctness of query results: if the database facts are true, so will be the derived facts. A core practical advantage of the relational data model (RDM) is physical data independence (PDI): queries and applications do not have to be revised when databases are reorganized physically (in storage).
Designs that adhere the Principle of Full Normalization (POFN) minimize integrity risk and ensure semantic correctness i.e., they do not produce query and update anomalies. Informally, a R-table is fully normalized -- which, for our purposes here means in fifth normal form (5NF) -- if it represents exactly one entity class. For example, two related entity classes, departments and employees, would be represented by two R-tables in a POFN-compliant database.
In practice, either by mistake or design, the POFN is violated by “bundling” multiple classes into single R-tables, in one of two ways. The first is to nest facts about one class within facts about another and using one R-table to represent them. For example, if employee facts are nested within department facts, one R-table can represent both.
EMPLOYEES is a column defined by the database designer on a non-simple domain, i.e., one that has components and, therefore, non-atomic values. In this particular case the values are R-tables and the components are columns and rows.
A data language with access to attribute components is very powerful (e.g., queries can restrict/project/join inner and outer tables), but inferencing from nested (compound) facts is prohibitively more complex and prone to error. This complicates table and constraint definitions, query expressions and interpretation of query results. What is more, such languages are not declarative -- they require logic higher than first order logic, which introduces self-referencing and undecidability -- defeating PDI. The RDM trades the expressive power of second order logic for the decidability and simplicity of first order languages and PDI.
A R-table with attributes defined only on simple domains takes a less convoluted form -- a normal form -- devoid of nesting. If R-tables are in the preferred normal form i.e., components meaningful to applications (here, employee attributes) are simple domains in their own right and a true RDBMS enforces value atomicity -- first order logic is sufficient. This imposes some limitations on the expressive power of data languages, but they are declarative and PDI and simplicity are preserved. A true RDBMS enforces atomicity via a data language that does not allow applications to access attribute components not explicitly defined on their own domains.
In the special case in which a R-table has an attribute defined on a non-simple domain that happens to be a relation-valued domain (RVD) like EMPLOYEES, it is possible to normalize it by un-nesting the inner R-tables into a separate R-table, as in the two-table design above. Note that both R-tables are in their normal form, devoid of non-simple domains (strictly speaking, this works only with RVD’s that have union-compatible R-tables i.e., with the same type and number of columns as values).
For various reasons, SQL DBMS’s are not truly relational. They:
- Do not support true (user-defined) domains and—a blessing—RVD’s either (Oracle supports “nested tables” which are not true RVD’s);
- Implement special functions that give access to components of domains defined as simple by database designers, a subversion of atomicity and, by creating domains and R-tables on the fly “behind the back” of the DBMS of the RDM too. Analysts should ascertain that this facility is not abused.
The normal form is currently known as first normal form, which has to do with the second way of bundling. On that and further normalization beyond 1NF in future posts.