Communicating
Facts
Facts (data/information)
may be expressed in many ways - speech, writing, diagrams etc.
'Facts' may be true
or false, but are simple propositions, or information in (preferably)
basic or ATOMIC form.
To avoid ambiguity
(due to poor expression or nuances within the language), a more formalised
natural language has been devised.
eg. Carol studies
IPT
This fact may be
True or False (she may or may not study this subject, we could, however
store the information regardless of it's correctness). The fact is comprised
of two objects (a person called 'Carol' and a subject with the acronym
'IPT') and a relationship ( studies). The above fact is in INFIX
form (the role is embedded amongst the objects.
We could re-stated
the above fact without loss of information in PREFIX form as
studies ('Carol','IPT')
Although the prefix
form of the fact looks a little strange (unless you are Yoda "... mmm Studies IPT Carol does") it still says the same
thing. PROLOG stores facts in prefix form, and usually calls them clauses.
The connecting term/phrase
is called the predicate (or role) and provides the lexical or linguistic
link between the connected objects (ie. the 'meaning')
Both formulations
above are examples of a binary fact - ie. there are two entities involved
in some relationship.
General binary fact
form : entity predicate entity
prefix alternative: predicate (entity, entity)
Entity are OBJECTs
and can be either physical objects-things you can touch (eg. persons)
or abstract objects (like a rating in a subject).
Fact types (sometimes
called elementary sentences) come in pre-packaged shapes - that is they
are constructed with particular types of objects in mind.
desk studies
grossPay
The above attempt
to use the fact type is obviously NONSENSE - clearly, certain FACTS
rely heavily on CONTEXT and ENTITY TYPE.
FACT TYPES (or Sentence
types) are only defined for ENTITIES they were designed for. The following
deep sentence structure analysis begins to explore the objects and the
nature of their relationship:
Entity of studies Entity of
type Person type Subject is clearer
Entity of studies Entity of
type Person type Subject entity category
named with acronym label category
'Carol' 'IPT'. instance category
When using a 4GL
or 5GL, we need to define the type of each label category (eg. integer,
char(8)...)
Facts may be UNARY
(involving only 1 entity and a predicate)
eg: 'Fred' is_male or is_male('Fred')
'Julie' is_female or is_female('Julie')
'Olivia' is_male
where "Fred", "Julie"
and "Olivia" are all examples of entity label instances
Facts may also be
Ternary (involving 3 entities and 2 predicates)
eg: 'Phil' has_height 175 at_age 17
or has_height_at_age('Phil',175,17)
This fact is also
ATOMIC as it cannot be split into separate facts without the loss of
information...
'Phil' has_height 175
'Phil' has_age 17
17 is_age_of_someone_with_height 175
the 're-expression'
of facts as simpler facts can often lose the relationships between specific
entity instances
name = reference cm = reference yrs = reference
person = entity length = entity age = entity
"Phil" = instance 175 = instance 17 = instance
sometimes, a fact
involving 3 or more entities may be split into simpler facts without
the loss of information:
'Olivia' enjoys_eating 'Broccoli' and 'Tripe'
same as: 'Olivia' enjoys_eating 'Broccoli'
'Olivia' enjoys_eating 'Tripe'
notice the statement
of two simpler facts represents the same information, and uses the general
fact type:
entity of type person likes_eating entity of type
food_type
Fifth
Generation Architecture
If 2 humans are
discussing a UoD, it should be possible to replace one of the humans
with a 5GIS (with appropriately specified UoD), without significant
change in the language used for communication.
CONCEPTUAL SCHEMA = description of UoD including
facts and rules.
|
|
CONCEPTUAL INFORMATION PROCESSOR -------- USER
|
|
INFORMATION BASE - stored somewhere, somehow, but
whose 'state' is understood.
CIB
- The Information Base is initially empty, and with each update request,
the STATE is changes. For argument sake, imagine the IB is designed
to contain facts about war, and only has the following fact:
America is_ally
By the inclusion
of a country in an is_ally relationship, we explicitly state
who is our friend. Does the omission indicate the converse? Clearly
we need to be careful what inferences we draw from data that is not
included.
If we add the fact:
Japan is_enemy, we have fundamentally altered the state of
the information base - a STATE TRANSITION has occurred.
A CONCEPTUAL
SCHEMA contains a formalised description of the UoD, contains
- Stored Fact Types
= kinds of facts which can be stored = types of entities + reference
+ roles
- Constraints =
restrictions applying to fact types = static constraints that apply
to all DB states + dynamic constraints that forbid certain changes.
Also known as VALIDATION RULES
- Derivation Rules
= list of functions, operations + rules used to derive info not explicitly
stored in the DB (inc. calculations (eg. +,-,*, sum, avg....). Derived
fact types also listed as rules (rather than stored as fact types
to avoid update anomalies)
- Transition Rules
= allowable state transitions for facts already stored (eg. married
--> divorced, single --> married, married --> widowed, widowed -->
married)
To transact with
the database, we use a CONCEPTUAL INFORMATION PROCESSOR, which
has three main functions:
- Design Filter
- humans describe UoD, with CIP approval, a conceptual schema is formed.
Changes to UoD also overseen by the CIP
- Data Filter -
user updates facts stored in IB, theCIP checks for consistency, if
OK then changes committed.
- Info Supplier
- user queries about the UoD, the CIP replies by first drawing the
answers from the IB
CIP Problem 1
Stored Fact Types:
F1 Person IsBrotherOf Person
F2 Person IsParentOf Person
F3 Person Has Height
Constraints:
C1 Each Person's Height must be recorded
C2 Each Person has at most one Height
C3 Nobody is his/her own Brother
C4 Nobody is his/her own Parent
Derivation Rules:
D1 avg(Height) returns the average of the recorded heights
D2 X IsUncleOf Y if X IsBrotherOf Z and Z IsParentOf Y
The
conceptual schema above can be represented diagramatically using a Conceptual
Schema Diagram as follows:
Determine the CIP
responses to the following requests (assuming the Database is EMPTY
initially, and each request is processed before the next is issued):
+ Person
"Alan" Has Height 178
+ Person "Alan" IsBrotherOf Cat "Felix"
+ Person "Alan" IsBrotherOf Person "Sue"
+ Person "Sue" Has Height 170
+ Person "Alan" IsBrotherOf Person "Sue"
+ Person "Alan" IsBrotherOf Person "Alan"
+ Person "Alan" Has Height 180
- Person "Alan" Has Height 178
+ [ - Person "Alan" Has Height 178 + Person "Alan" Has Height 180 ]
+ Person "Mary" Has Height 175
+ Person "Sue" IsParentOf Person "Mary"
+ avg(Height) = 174
Suppose the following
queries were legally expressed, and the database was complete with facts
from the previous section:
avg(height)
?
Who IsUncleOf "Mary" ?
Who IsAuntOf "Mary"
Suppose this additional
information was added to the UoD:
F4 Person
Eats Food
and
C5 Each Person Must be recorded to Eat a Food
If the database
started out EMPTY again, what would the CIP respond with in each of
the following cases:
+ Person
"Bob" IsBrotherOf Person "Anne"
+ Person "Anne" Has Height 170
+ [ + Person "Anne" Has Height 170 + Person "Anne" Eats Food "Spinach"
]
+ [ + Person "Bob" HasHeight 175 + Person "Bob" Eats Food "Apple" +
Person "Bob" Eats Food "Spinach" ]
+ Person "Bob" IsBrotherOf Person "Anne"
CIP Problem 2
Stored Fact Types:
F1 Person Has FitnessRating
F2 Person Plays Sport
F3 Person IsExpertAt Sport
Constraints:
C1 Each Person has at least one FitnessRating
C2 Each Person has at most one FitnessRating
C3 FitnessRatings are denoted by integers 1..10
C4 Each Person IsExpertAt at most one sport
C5 X IsExpertAt Y is stored only if X Plays Y is stored
Derivation Rules:
D1 count(Plays Y) returns the number of players of Y
D2 X IsFootballer IF X Plays League OR X Plays Union OR
X Plays AustralianRules
The
conceptual schema above can be represented diagramatically using a Conceptual
Schema Diagram as follows:
If the Database
is initially Empty, what would the CIP response be to the following
update requests:
+ Person
"Gladys" Has FitnessRating 9
+ Person "Fred" Plays Sport Soccer
+ Person "Bob" Has FitnessRating 7
+ Person "Gladys" Has FitnessRating 8
+ Person "Olivia" Has FitnessRating 7
+ Person "Freda" Has FitnessRating 15
+ Person "Gladys" Plays Sport League
+ Person "Bob" IsExpertAt Sport BonsaiTreeClimbing
+ Person "Gladys" IsExpertAt Sport League
+ Person "Olivia" ProgramsIn Language Cobol
+ Person "Gladys" Plays Sport Badminton
+ Person "Freda" Plays Sport Judo
- Person "Freda" Has FitnessRating 7
+ [ + Person "Bob" Has FitnessRating 8 - Person "Bob" Has FitnessRating
7 ]
+ Person "Gladys" IsExpertAt Badminton
+ Person "Bob" Plays Sport Soccer
+ Person "Freda" Plays Sport Judo
Assuming the following
queries are legally expressed, and the database is loaded with the facts
from the previous section:
Person
"Gladys" Plays Sport League ?
Who Plays Sport Judo ?
count (plays soccer) ?
Who is Footballer ?
What FitnessRatings are permitted ?
What is the meaning of life, the universe and everything ?