IPT - A Virtual Approach - Correlation and Existence

Sub-Queries are evaluated by the SQL optimizer. It is normal that a sub- query is evaluated ONCE and then the result is passed to the outer query.

A Problem: With the MUSIC database, get the system to list all artists along with the name of their most recent album

NOTE the existence of BOTH a join condition and a subquery.

select alb_name, artist, p_date
from albums A
where p_date in

(select max(p_date)
from albums
where artist = A.artist)

NOTE: there is an explicit reference to an outer query column. Whenever a sub-query 'reaches out' to the outer query, the sql optimizer is forced to RE-EVALUATE the subquery for each row of the outer query. This takes time and can rapidly increase processing time - should be avoided UNLESS THERE IS NO OTHER WAY

alternate mSQL:

select alb_name, artist, max(p_date)
from albums
group by artist

Correlation with large tables VASTLY increases processing and retrieval time, even if compared with a non-correlated query that does the same job but takes longer to write. We should use group by, join or temporary tables to avoid correlation if possible.

Correlation example 2: People who are cousins of Gina.

select person
from family <--- people with parents that are children of gina's grandparents
where parent in

(select person
from family <--- parents with gina's grandparents as parents
where parent in

(select parent
from family <--- gina's grandparents
where person in

(select person
from family <--- gina's parents
where person = 'gina')))

this query will result in an answer table that contains GINA, and her brothers and sisters, along with her cousins

a correlated revision of the above query....

select person
from family A <--- people with parents that are child of gina's grandparents
where parent in

(select person
from family <--- parents with gina's grandparents as parents
where parent in

(select parent
from family <--- gina's grandparents
where person in

(select person
from family <--- gina's parents
where person = 'gina'
and parent ^= A.parent)))

This query is very processor intensive, and takes a significant time on a slow processor.

An Alternative Formulation..... Gina's cousin revisited

select distinct person
from family A, family B, family C, family D
where a.parent = b.person
and b.parent = d.parent
and c.parent = d.parent
and c.parent ^= a.parent
and a.person = 'gina'

This executes faster than the correlated solution even though it is a 4 table join.

EXISTS OPERATOR

Question:list the total number of artist names that have albums with track listings (i.e. songs in the tracks table)

select count(artist)
from albums
where ser_num in

(select ser_num
from tracks)

select albname, artist
fom albums A
where exists

(select *
from tracks
where ser_num = A.ser_num)

The EXISTS operator is used to test if there is at least one row in the sub- query table that matches the outer query condition. If there is, then the outer query is evaluated.

NOT EXISTS is also available.

CORRELATION - a final example...

In the table, you are required to produce a list of those letters that are duplicated..

select letter
from hangmanletters
group by letter
having count(letter) > 1

NOW write a query that deletes from the table all duplicates from the table.

delete from hangmanletters
where letter in

(select letter
from hangmanletters
group by letter
having count > 1)

results in....the wrong answer

This can be explained by the fact that the subquery is evaluated only once, and therefore those found to be duplicated in the table are deleted (every occurrence of them)

solution:

delete from hangmanletters H
where letter in

(select letter
from hangmanletters
where letter = H.letter
group by letter
having count(letter) > 1)

NOTE the correlation