General
General purpose
languages (Pascal, FORTRAN, COBOL, C...) allow users to solve many types
of problems but are rarely used to build commercial database applications
due to questionable efficiency. Whilst it is sometimes the case that
interfaces are built with lower level languages,
the database engine is almost always SQL in
some form or another.
As a contemporary
example, most large websites these days are driven by databases underneath
- the language used to store and manipulate the data in those databases
is usually some flavour of SQL. The languages to get that data to your
browser window, however, are hypertext pre-processing scripting languages
like PHP, ASP, JSP, Cold Fusion). They
are usually server-side meaning the executable code
is processed and the answers are sent to you from the server - these
languages are generally embedded in webpages and executed live. This
is how a chat forum can appear to be updating itself as you type.
Algorithmic and
scripting languages are inadequate (or at least verbose) when dealing
with database type data, creating files and accessing/modifying them.
To provide even the most rudimentary database functionality and ease
of maintenance is complex and prone to error in a low level language.
Delphi and other modern database applications can use an external
SQL engine and report generator to 'simplify' the creation of database
applications.
SQL (Structured
Query Language) is a FOURTH GENERATION DECLARITIVE LANGUAGE. It is considered
thus because (with declaritive systems) you specify WHAT you want it
to do, but are not required to tell it HOW to go about it. It is written
specifically to deal with information that appears to be stored in tables.
It is set-based, harnesses an internationally accepted syntax and can
deal with huge quantities of information very efficiently.
If you take a REALITY
CHECK for a moment, there are low-level storage structures that conform
to any notion of a table - primitive data types must still be at the
heart of the storage (integers, characters etc) and at a much lower
level, 1's and 0's on disk are interpreted at a much higher level in
table form.
SQL interfaces insulate
the user from the PHYSICAL LAYER, and presents (through much efficient
pre-processing) data in 'human friendly' tabular form. Humans are comfortable
with tabulated data, and indeed can deal with complex patterns of numbers
and characters very efficiently and intuitively in this form. Additionally,
a large body of set-based theory allows us to manipulate tables powerfully
and SQLs harness this conceptual strength to allow enormously complicated
things to be done simply.
SQL is not, however,
the 'bees knees' of languages. It will allow solutions to certain types
of problems, and is very powerful in those situations. Take SQL out
of a declaritive problem set and place it in a procedural realm and
you will soon realise it's limitations in solving those sorts of problems
suited to algorithmic languages. It is hopeless, for example, as a gaming
platform - that is not what it was designed for.
A RELATIONAL
DATABASE comprises (usually) a collection of tables. Each table
in a relational database has something to do with at least one other
table in that database system.
|
This simple
relational database comprises 3 simple tables (Albums, tracks
and Performers) with SerialNumber being the common link between
all 3 tables.
This is the
MUSIC database and you will be dealing with it and variations
of it in depth shortly. |
Tables contain RECORDS
(or rows) of objects organised into columns called FIELDS. In
the above example, the Performers table has 3 fields (SerNum, Artist
and Instrument). Each field has a type also (common field types include,
character, integer, real, ctring, date, currency, BLOB..).
MySQL (the 'flavour
of SQL we will start with) offers many field types, however we shall
initially concentrate on two: TEXT =char(n) and NUMBERS =decimal(n).
Eventually we will
choose from a wider range of data types available - the field type is
determined by the sort of information you are wanting to store there
- numbers or not-numbers is usually the first distinction. Within numbers,
there are lots of different styles of numbers (real, integer, date).
Non-numbers also encompass various storage classes including text, file
and so on. You can see a more complete set of data types in the MySQL
Dictionary associated with this section.
THE MUSIC DATABASE
The MUSIC database comprises
3 related tables....
ALBUMS Table
this table contains most general information
about each recording
7columns, each row is a single recording
TRACKS
Table
this table details the songs and their order to be
found on each disk of a particular recording
4 columns, one song per row
PERFORMERS
Table
this table summarises who played what instrument
on a particular recording
3 columns, one instrument per row.
The above database
will, through the course of instruction, become familiar to you as you
investigate it's contents. It may appear strange to you that the data
is stored in tables of this shape - DON'T PANIC, there is a GOOD
reason for the shape of the table containers - this is covered elsewhere
in the course.
SQL is DECLARITIVE
in nature - you tell it what to do, not how to go about doing it. It
is termed a RELATIONAL LANGUAGE (as it conforms to the Relational
Model of Data).
Information stored
using this type of language is called a RELATIONAL DATABASE, the controlling
software is termed the RELATIONAL DATABASE MANAGEMENT SYSTEM (RDMS)
The Relational Model Of
Data - Some Important Starting Definitions
Throughout this section,
aspects of the relational model of data will be explained. Presented
here are a selection of characteristic properties that typify data in
a relational database.
It is understood that:
- It is nearly always possible
to express information in table form
- A table, however, may
not be the most efficient storage form, nor may it be easily read.
Because we are using computers to assist us in manipulating and accessing
the information, the arguement of 'efficiency' of storage becomes
less and less relevant.
- There are often many ways
to convert data to tables but different representations make different
assumptions about the information.
example 1: a fragment
of bit-mapped graphics could be reduced to a series of pixel co-ordinates
and their corresponding indexed colour:
x y color
---------------------------
53 287 1
99 115 3
4 16 1
: : :
This 3
column table (a ternary relation) is one way of representing
the information stored in a bitmap's file. We could hand this data (in
some form) to a program that could recreate the image exactly, by placing
the coloured pixels in the correct screen coordinates.
NOTICE
that the row order is irrelevant to the overall image
example 2: A Simplified
Family Tree
It should
be noted that no assumptions have been made regarding the gender of
the offspring or their parents due to the likelihood that some names
are used equally for each gender (purists would also argue that you
have not been given this information and so cannot include it).
In both the above
cases, the relationship between the entries in each of the rows
are the SAME for each row, but NO ROW IS THE SAME AS ANY OTHER
ROW
Entries in each
row RELATE to each other (eg. coordinates together define a pixel, or
x is_parent_of_y)
TABLES are collections
of INSTANCES (tuples/rows/examples stored as records) of relationships.
Each instance in a table has the same relationship type as other rows
in that table, but a different one to rows in other tables in the system.
Good table headings
usually suggest what the relationships are. For example: in the Performers
Table
ser-num...features...artist...playing...instrument
The table's instances (ie.
rows/tuples in the table) form the tables' POPULATION. Providing
the table with instances is called POPULATING the table. An un-populated
table is termed a null table.
It is not possible to populate
the table with unrelated instances. As an example, it makes no
sense to provide meteorological data in a table about births, deaths
and marriages - regardless what the astrologers say :)
TABLE ARITY - provides
the degree of the relationship, or the number of objects involved
in the relationship, or the number of columns on the table.
tracks has ARITY 4(a quatanary
relationship)
performers has arity 3 (a ternary)
albums has arity 7
it is possible to have arity
1, with no instances - an empty (or null) UNARY table.
Normalised
Tables
The relational model
of data requires that each FIELD (or column entry) is a SINGLE ENTITY
(ie. no repeating groups). In other words, a column entry should
be only a single piece of information.
eg: ser_num songs
-------------------------------
NCC1701 song1, song2, song3
The above example is incorrect,
as both the column heading (a plural) and inspection of the data reveal.
This formation is not supported by the relational model of data,
and therefore should we accept it as part of our table system, we would
need to accept that some types of data retrieval requests (queries)
are either impossible or very inefficient to complete.
To re-work the same information
into a relationally acceptable form, we could have:
eg: ser_num song
-----------------
NCC1701 song1
NCC1701 song3
It should be noted that the
two tables above 'say' essentially the same thing, the latter table,
however, is easier or more straight-forward to access.
Tables without repeating
groups are termed FLAT(or normalised).
Column entries should not
be able to be divided (that is they should be SINGLE VALUED FACTS).
IF this is not the case, search and retrieval problems arise through
these repeating groups
eg mum dad children
------------------------------------
bob carol ted,alice,fred,olivia
This is really a repeating
field also, as we have listed child objects together in a single
field.
"Who is the father of
'fred'?" requires a messy search using sub-strings and partial
pattern matching (such a search becomes performance prohibitive on large
databases)
Row headings NOT ALLOWED
- a relationally acceptable table has rows in which order is not important
! Repetition of rows is also not allowed
eg Mon Tue Wed Thur
1 IPT ENG MI CHEM
2 MII GR GEO IPT
3 FR RE ART YAK PLATTING
is a 'trivial' example
where row headings (1,2,3...) allow the table to make sense - remove
them and we lose information.
The above information
could be transformed into a number of relationally acceptable forms
as detailed below:
either day period class
mon 3 FR
wed 1 MI
this formation allows
simple searching
or period mon tue wed thur
3 FR RE ART YAK
1 IPT ENG MI CHEM
Q: "When do you have
french?" involves a messy search
Relational table design should
never be dictated by the types of questions that are foreseen. The purpose
to which the database is to be put is important, but if it is designed
well, it can be used for many different things (with an increased liklihodd
that it can cope with unforseen queries as well).
Although these issues are
important to performance of the database system, using them as the sole
basis for design increases the liklihood that unplanned questions may
not have answers in the resulting tables.
Another common mis-conception
is the difference between how the data is presented and how it is stored
- these two aspects of the data may have little to do with each other.
Inded, in a well designed system, the data may be displayed with great
flexibility.
It is (unfortunately) possible
in poorly designed systems that certain questions have no easily attained
answer.
DON'T PANIC! ----> correctness
of table design is covered elsewhere
in the course.
Tables that have been stored
relationally in simple forms can be VIEWED differently - a VIEW
is an alternative presentation of stored information, displayed prettily.
Table Keys
Keys are essential in a relational
database. They are statements of UNIQUENESS. Keys are what is
used by the RDMS to tell the rows of a table apart from each other,
and how rows in one table relate to corresponding rows in another table.
Unfortunately, terminology
like Primary Key would tend to intimate that that type of key
is more important than any other type of key - this is plain nonsense.
All keys, be they 'primary', secondary, foreign or merely indexed fields
are important.
- PRIMARY
KEYS
This is a KEY
that uniquely identifies each row of a table. Each table should
have at least a primary key, else we lose the power to manipulate
the information fully, and maintaining the tables' data may not
be possible. It makes sense that rows of a table are all different
- why should we store the same record twice?
In the Music
Database, the primary keys are as follows:
- Albums =
ser_num
- Tracks =
ser_num + song
- Performers
= ser_num + artist + instrument
A
key can be a SIMPLE KEY (one column) or a COMPOSITE KEY (involving
2 or more columns in combination)
A key column
should NEVER BE NULL - that is, there should always be a unique
value supplied in a key column
The larger the
arity of the key, the more work the RDMS has to do to maintain the
integrity checking as data is added/modified. Large keys can seriously
impede system performance. In certain circumstances, an artificial
primary key (single column, like an ID) is created, to simplify the
indexing/updating of table rows.
- FOREIGN KEYS
A foreign key
in a table is a corresponding column that is a PRIMARY KEY in another
table.
The Music database
has two foreign key definitions -
In the tracks table, a particular SerNum must already exist
in the albums table: tracks.SerNum is a foreign key
referencing albums.
Similarly, performers.SerNum is a foreign key referencing
albums. This means that in order to enter performer or track
listing for any album, that album must already exist in albums.
eg. Suppose we create a new music table 'Origins':
Artist Home
char(20) char(30)
key
the Artist column is the
primary key of this table, in both the Albums and Performers tables,
that column now becomes a foreign key.
3. SECONDARY KEYS
-
A secondary key
is a column (or combination of columns) in a table that also has
no repeats contained in it. In Access, secondary keys are implemented
as Unique Indexes. A secondary key is put in place to reflect
a constraint (or rule) that exists in the universe the database
is made to work in. They are designed to provide extra data checking,
ensuring inconsistent data is not entered.
Key columns can be used to
ensure that the database retains its integrity - that is the
information is not allowed to be stored in an inconsistent form.
Foreign key definitions enforce
Referential integrity of the database. Be warned, however
that they can cause CASCADE effects. Consider the effect of deleting
an album - what effects would it have on the database???
Significant
Populations
A population is
SIGNIFICANT if there are sufficient instances to deduce the PRIMARY
KEY, and the underlying relationship binding the table together.