IPT - A Virtual Approach - Relational Schema

The central datatype for all relational systems is the TABLE. They behave like containers for your data.

Forming relationally acceptable tables 'in their best form' is the aim of this section of the course.

Normalisation

A Conceptual Schema consists of a set of fully normalised relations - that is fact types in elementary form. As part of the process of conceptualising a UoD, each elementary relation is scrutinised, instantiated and (for the most part) as good as they get.

'Good' database design balances a number of things - the number of NULLS is balanced against the number of tables in the resulting relational schema (sub-typing, as you know, can eliminate the need for nulls).

A CSD, while free of redundancy, may not always lead to an efficient table design (particularly if commonly required 'facts' need to be recreated from many disparate elementary facts). A database designer may choose (after conceptual analysis) to include columns that are redundant in order that certain operations are faster. This choice (called controlled redundancy) comes with enormous risk - update anomalies and referential integrity checks become very complicated (but not un-manageable) under such schemes.

Field typing, although explicit as reference modes on a conceptual schema, may need interpreting based how the implementation is to proceed. Numbers (6.5 as an example instance) could be stored as a real number (which occupies 6-12 bytes of memory) or text (1 byte per character = 3 bytes in this example). The choice bottom-lines on what the values are going to be used for. Certainly, if numerical quantities are required for mathematical calculation, then they should be stored as numbers. If, on the other hand no calculation is required on that numeric field, then storage as text may be justified - ordering, search and retrieval are just as rapid for a fraction of the memory cost.

Optimal Normal Form

Part of the process of conceptualisation is KEY recognition. Each relation instance is distinct from every other relation instance by way enforcement of the uniqueness constraint placed on that relation (during step 4 of ORM). It is appropriate to use the key of the relation to describe (in part) 'what the fact is about' (i.e. it's context).

When deciding what fact types should be grouped into which tables, it is reasonable to expect that relations that are about the same thing should be grouped together. Similarly, fact types that are keyed differently are clearly about different things and belong in separate tables.

Un-Normalised relations involve putting all fields in the one table. Clearly this is inefficient and inappropriate for a relational system in most cases. Incidently, this is the way the vast majority of non-relational database products store their data. Traditionally, Normalisation (First Normal Form up to 5NF) was employed in table design, successively refining tables, splitting and testing until the desired result was obtained - this is time consuming, prone to error and mis-interpretation.

For a given Conceptual Schema, several relational schemas (table designs) may be suggested. The Optimal Normal Form Algorithm (Nijssen & Halpin) aims to provide a simple, safe and efficient table design for a given conceptual schema. These designs have no repeating fields, no redundancy or update problems as each fact is stored only once, and a minimum number of tables to achieve this.

Using ONF, each fact type is mapped into only one table, with the roles sometimes mapping into column names. Since the fact type maps onto a single table, the uniqueness constraints of that fact type become the 'primary' keys for that table. Other uniqueness constraints provide secondary keys, sub-set and equality constraints become foreign keys which lace each of the tables together.

One problem inherited from a number of flavours of SQL (Oracle, Paradox, Access to name a few) is the distinction of a primary key for a table. The name implies that one key (the nominated primary) is more important than other uniqueness constraints (including foreign keys) - this is simply nonsense. The total uniqueness picture is as important as each individual key in defining the behaviour of data stored within the tables. Unfortunately, a number of the key conditions found in the real world are difficult to implement, depending on the SQL flavour chosen.

The Optimal Normal Form Algorithm

What follows is a simple method of conceptual-relational mapping. Simple and compound keys are explained as follows:

Simple Keys span a single role box; Compound/Composite Keys span more than one role box

(a) Group fact types with simple keys attached to a common object type into the same table. The primary key for this table is the simple key that was the basis of the group

Notice, the uniqueness constraint that is common for both fact types becomes the primary key for the resulting relation.

Nested fact types provide cases to apply the same rule as each attribute of the nested object has a simple (and common) key, therefore the attributes belong together as they are about the same thing

ONF on nested fact types

This picture becomes a little clouded in flattened form - look for 'object uniqueness' in groups of high arity facts as these may also be combined into single tables

Constraints spanning fact types either provide secondary keys, or form the basis for foreign keys. Notice that each of the displayed fact types share a simple key (hence the single table), with the inter-fact type constraint also in place.

(b) For each fact type without a simple key, create a separate table. Select the shortest key of the fact type as the primary key for that table

(c) For each fact type remaining, create a separate table. Use the key of the fact type as the primary key of the table.

When the fact type is a 1:1 (i.e. it has two simple keys) where one of the simple keys is common with a collection of other fact types). Such a situation calls for a little common sense and a decision as to whether the fact type belongs with the others or is about something else. A balance between the number of tables and the storage efficiency is called for here. Either way, the fact type will cause a secondary key to exist.

Sub-types present table designers with decisions - put up with nulls in order that a small number of tables is obtained OR increase the number of tables (each sub-type can be its own table) with no nulls.

When deciding on names for table fields, use either the role or the attached entity or some combination to ensure the name is meaningful.

The Data Dictionary

An often overlooked step in table design is the object type listing (otherwise known as the Data Dictionary). Prudent decisions regarding field types can make databases easier to manipulate, be efficient in terms of storage space and to a degree more flexible.

It is suggested that a list of every field, along with the suggested data types be compiled. The aim is minimise the number of different types used and standardise data types across related fields to ensure UNION COMPATIBILITY. This list will greatly simplify the process of table creation during implementation, and will allow some complex inter-field operations later.

As has been suggested, choice of numeric field types may not be efficient in terms of the related storage space. This is particularly so in those cases where mathematical computation is unlikely. The saving in storage space, particularly in large systems can be enormous, without the loss of performance or flexibility.

Relational Schema

Table Design

Normalisation

Optimal Normal Form

The Optimal Normal Form Algorithm

The Data Dictionary