SQL returns sets of results - the result table is a set of tuples.
We have used in and union as set oriented operations.
Also available are the following:
any - meaning some member of
all - meaning every member of
any
1965 in (1962, 1965, 1971) is true
1965 = any (1962, 1965, 1971) is true
(is birthyear some member of the set(...))
Problem:
1962 ^=any (1962, 1965, 1971) is true
this is not the same as NOT IN
1962 ^=any (1962) this is false
this is the only case where it is
all
1962 =all (1962, 1965, 1971) is false
1962 <> to every member
1967 ^=all (1962, 1965, 1971) is true
1962 ^=all (1962, 1965, 1971) is false
1962 >=all (1962, 1965, 1971) is false
1971 >=all (1962,1965,1971) is true
it is also the LARGEST
In general, we use not in rather than ^=all
Sub-Queries
Sub-queries are, in general terms, selects that form sets of values
for use inside other selects
select col
from tab
where col OPERATOR
(select col
from ....)
where the OPERATOR is a set operation (in, ^=all, ... =)
In the above diagram, set intersection (underneath) and set difference
(on the top) is illustrated. The corresponding query is:
select albname
from Albums
where PDate between 1990 and 1995
and Recco = 'EMI'
and Media = 'LP'
and sernum not in
(select sernum
from performers
where instrument = 'Piano Accordian')
and sernum in
(select sernum
from tracks
where song like '%buffalo%')
NOTES ON SUBQUERIES
- The equals sign ('=') is only useful for subqueries that are single
value returns
- In order of precedence, sub queries are calculated FIRST, and normally
ONLY ONCE
- The returning result from the sub-query must be compatible with
the condition linked to it.
- A sub-query must return a SINGLE SET (ie. one column) of results.
Should a multi-table return prove necessary, one solution is to create
a view or temporary table that contains the desirec field collection.
- It is INVALID to ORDER BY in a sub-query ==> set theory tells
us that the returning set has no inherent order
- It is INVALID to use DISTINCT in a sub-query ==> again, set theory
tells us that the returning set is automatically distinct
UNION, INTERSECTION and DIFFERENCE
Unioning two (or more) type compatible fields allows us to 'merge'
the two populations to form a single, distinct list.
select artist
from performers
union
select artist
from albums
order by 1 asc
The above query results in a distinct list of artists (as set operators
exclude all repeats) - SINGLE copies of all the artist names in the
performers table, with any new ones added from the albums table.
To use ORDER BY for unioned queries, you MUST use column numbers and
not names (as the result table column contents come from a composite
source)
select sernum, artist
from albums
union
select sernum, artist
from performers
order by 2
The above query will be aranged by the ARTIST column
Query Formation
In order to identify what the query you must write is, you first must
identify where the information resides (ie. which columns, tables..)
and what has to be done to that information before you want to see it
(column functions, grouping, ordering...)
eg:In the CLASS database, list the full names of people that share
hobbies in common with Fred Smith
select name, fname <-- people
from student
where sid in
(select sid <--sids that like one or more of Fred's hobbies
from enjoys
where hobby in
(select hobby <--Fred's hobbies
from enjoys
where sid in
(select sid <--Fred's sid
from student
where sname = 'Smith'
and fname = 'Fred')))
JOINS
There are MANY 'flavours' of join available in mSQL, and some correspondingly
powerful operators.
A detailed examination of Joins
is available in a paper written by Mr Peter Whitehouse for the QSITE
State Conference in 1999
The CROSS Join
When you take every row in one table and join it to every row from
another table, you are said to be forming the Cartesial Join or a CROSS
JOIN.
Such a join is 'unconstrained', and in many situations, provides useless
information as rows of data are connected to otherwise un-related information.
For Example:
gpsSchools
|
The following is an example of a CROSS JOIN, that partially re-works
the GPS Playlist problem:
select P.school, Q.school
from gpsSchools P CROSS JOIN gpsSchools Q
The result table would contain 2 copies of every game, plus games
where teams play with themselves - a less than useful result.
The cross join above is functionally identical to the following
code:
select P.school, Q.school
from gpsSchools P, gpsSchools Q
In the above code, the COMMA (,) is a JOIN OPERATOR when used
in the FROM clause. By default, mSQL performs a CROSS JOIN when
a comma is used.
|
The NATURAL INNER Join
When you join two tables, ensuring that ONLY related rows are placed
together, you are usually performing an INNER join.
To connect, in the Music Database say, album details to Track information,
we could use the following query:
select albname, artist, song
from albums A, tracks T
where A.serNum = B.serNum
In this query, we explicitly state the join condition (that is what
must be correct if two tuples are to be connected). Another formation
of the same query is as follows:
select albname, artist, song
from albums NATURAL INNER JOIN tracks
or indeed, just:
select albname, artist, song
from albums JOIN tracks
In both of the above queries, only rows in the Albums table that have
corresponding Tracks entries are displayed.
This becomes complicated when tables have columns with the same names
but where you DON'T want those columns to be part of the join (remembering
mSQL will associate them BECAUSE they have the same name)
Should a MULTI-column natural inner join be necessary, then the USING
clause can be used, using the following syntax guide:
select col {,col}
from table1 JOIN table2 using (col,col {,col})
As an example, a query that would list Album Names of those that feature
the artist also playing an instrument on that album:
select distinct albName
from albums join perforers using (artist, sernum)
Since, in the music database different conventions have been used for
storing names (Surname, Firstname VS Firstname Surname), the above query
only finds those artists that go under a single name (eg. Bjork and
Vangelis)
All of the above examples rely on the joined columns having the same
names in both of the joined tables. Often, however, this is not the
case, and we are faced with using different named columns as the basis
of joins. The ON clause can be used to overcome the difficulty, as follows:
select col {,col}
from table1 JOIN table2 ON (colFromTable1 = colFromTable2)
The LEFT OUTER Join
The first table mentioned in a FROM clause is termed the LEFT table.
A LEFT OUTER JOIN is used when you require all of the population of
the LEFT table's nominated column(s) and either RIGHT table data that
corresponds or NULLS if no data corresponds on the RIGHT.
To illustrate, consider a query that delivers AlbNames and corresponding
tracks on that album. We could perform a natural inner join thus:
select albName, song
from albums JOIN tracks
In the query above, NOT ALL of the AlbNames are present in the answer
table - only those that actually have tracks recorded.
select albName, song
from albums LEFT OUTER JOIN tracks
The above query will contain ALL of the albNames - the song column
will contain NULLS in those situations where track listings are missing
for an album.
The RIGHT OUTER Join
The RIGHT OUTER JOIN is similar to the LEFT OUTER JOIN, only the 'dominant'
table is the RIGHT table (all of it's tuples will be present, accompanied
by either corresponding data or NULLS from the left table).
These join types are VERY useful when it comes to creating datasets
for database applications.
JOIN or SUB-QUERY?
Consider the table: Data
Person char(15)
Sex char(1)
Birth_yr dec(6)
problem:list persons born in the same year as someone else, together
with that year
MySQL: (using sub-q)
select person,birth_yr
from data
where birth_yr in
(select birth_yr
from data
group by birth_yr
having count(person) >1 )
(using join)
select a.person, b.person, a.birth_yr
from data a, data b
where a.birth_yr = b.birth_yr
and a.person < b.person
MySQL is OPTIMISED to allow
JOINS to be evaluated faster than the recently added sub-queries. This is, however, far from standardised between 'flavours' of
SQL. Microsoft's Access SQL, for example, is optimised to evaluate Joins,
and sub-queries in Access SQL can take a long time to deliver answers.
Some diagram symbology for common multi-table formations
|
select a
from X
where a in
(select b
from Y)
|
this is a conjunction |
|
select a
from X
where a NOT IN
(select b
from Y)
|
this is a disjunction |
|
select X.a
from X,Y |
select X.a
from X CROSS JOIN Y |
|
select X.a
from X,Y
where X.a = Y.b
|
select X.a
from X NATURAL INNER JOIN Y |
select X.a
from X JOIN Y |
|
|
select X.a
from X LEFT OUTER JOIN Y |
|
select X.a
from X RIGHT OUTER JOIN Y |