Re: 3vl 2vl and NULL
- From: "dawn" <dawnwolthuis@xxxxxxxxx>
- Date: 10 Dec 2005 05:22:38 -0800
Hugo Kornelis wrote:
> On 9 Dec 2005 07:43:37 -0800, dawn wrote:
> (snip)
> >> "The age of Aunt Marge is 47 years" is a fact.
> >> "The age of Uncle Vernon is unknown" is also a fact - but it's not the
> >> same fact type as the former.
<snip>
> >I don't have a problem with including a NULL marker, but with its
> >interpretation in the languages of the database. It is feasible to
> >include a null and yet use a 2VL. Sure there are issues, but not so
> >many as with a 3VL and it is far simpler to understand and employ.
>
> Are you sure?
>
> "Show me all family members who are older than Aunt Marge" - should
> Uncle Vernon be included or not? Why?
No. We don't know Uncle Vernon's age and we can only work with the
facts we know.
> "Show me all family members who are NOT older than Aunt Marge" - should
> Uncle Vernon be included or not? Why?
Two choices of how to ask the question --
1) Show me all members who have an age where that age is less than or
equal to Aunt Marge's age
In this case Uncle Vernon does not show up because he does not have an
age
2) Show me all family members who do not have an age greater than Aunt
Marge's
In this case Uncle Vernon does show up
> "Show me all family members who are older than Uncle Vernon" - should
> this return noone, everyone, or only some? Why?
Here we are asking a question of comparison to a fact we do not have,
right? So it is a bad question that the user should recognize when
they get the results and can debug it from there. It is a garbage-in,
garbage-out issue. The response from the query language could be all
family members whose ages are not null. That is how the implementation
I use would handle it. The null sets all equal to Uncle Vernon's null
set for that data.
>
> In all these cases, you run into the problem that Uncle Vernon's age is
> not in the database.
Yes, you are doing comparisons against a lack of a fact, so to use a
two-valued logic, the question is changed to:
How does something compare to the fact (since we can only work with
facts and T/F with 2VL) that we have no age for Uncle Vernon.
> With or without databases, fact remains is that you
> don't know if Aunt Marge is older, younger or the same age as Uncle
> Vernon.
That is correct. We don't know and a 2VL needs to answer based on what
it does know. So, give it your best shot based on the data you have,
not the data you don't.
> This proves that, as soon as you allow values to be missing in
> your database, True and False are no longer the only possible results of
> a predicate.
No it doesn't. We can still define a 2VL on this data as indicated.
> Reality forces 3VL on you.
Why not 4VL or greater?
We need not have the database responses model all human responses.
People understand a T/F response and intuitively understand that the
computer cannot know everything and has to do what it can with the
questions asked.
This analogy is a little far off the path, but it is what popped into
mind and does help explain my point. The Apple Newton developers
worked very hard on hand-writing recognition, but failed to a large
extent in having satisfied users related to the handwriting recognition
feature. The palm pilot told humans how to make their letters and had
much greater success with this data entry from a pen approach.
Humans understand the limitations of a computer and can roll with it.
Trimming to a 2VL is really quite easy for end-users and developers to
work with. It isn't without issues, but seems to have fewer than a 3VL
has.
>
> Using 2VL might be simpler to understand and employ, but it can't
> properly depict all facets of reality.
And a 3VL can?
Another angle -- think of how the RM folks reply to my interest in
having list constructs in the dbms. Surely there are ordered and
unordered lists "in reality." But their point is that if you can
contrive a list through the RM by adding more tables, ordering
attributes, and writing your own inserts into those lists to renumber
items after that point, you can get the job done without adding more to
the language. In the case of lists, I agree this is possible, but
think it is a shame for the language not to do this for me.
We then reverse our positions on the 3VL vs 2VL where my take is that
you can get the job done with a 2VL and it is simpler than a 3VL, so
let's go with it.
What is consistent in my position on these two matters? In both cases,
I choose the approach that is easiest for the humans involved, whether
ot not it is the simplest mathematical model. These are human-computer
interaction questions to me. I don't care what goes on under the
covers (everything for the human, including performance, being equal)
but want the person working with the database tools/languages/api to be
as productive as possible.
> I prefer a model that is faithful
> to the reality I try to model over a model that's easy but wrong.
I know that the database itself at any point in time is not modeling
all of reality. Not only can we trim back on facts (not recording
every known fact) but on the way we process the facts by using a 2VL,
for example. The dbms is a model of only some aspects of reality. If
you were to list all possible human responses to the questions you list
above, there are many we do not want to model in the computer. For
example
1) Why are you asking that?
2) But Uncle Vernon died yesterday (or even last year).
3) I don't know (even if all facts are known)
4) I could use a glass of water
etc.
I think I can safely claim that we do not want to model all aspects of
reality. We want to simplify things and a 2VL does that quite nicely.
> >When DeMorgan's law goes out the window, the data and interpretation
> >become unnecessarily complex (for most applications and most people).
>
> I must be missing something. How exactly does DeMorgan's law cease to
> apply with 3VL?
I'm not an expert here, but I believe you have to enhance
~(p ^ q) --> ~p v ~q
to add nulls into the mix for a 3VL. Otherwise,
Uncle Vernon's age is unknown, Aunt Marge is 47
p=Uncle Vernon is over 40
q=Aunt Marge is over 40
p is neither T nor F so we don't have both p and q being T so we do
have
~(p ^ q) but we don't have the implication being true
Even if we can have an enhanced De Morgan's law for 3VL, it is not as
intuitive to a human (OK, me!) who would think that if the first part
is the case, then the implication would follow.
Did that make sense? If not, I'll admit that I'm not thinking this all
the way through right now, so if someone else wants to shed light on
this or explain why I might "feel" like De Morgan's law as I expect it
to work goes out the window in a 3VL please jump in.
> (snip)
> >> > If I look at my empty mug I can truthfully say
> >> >that there is an absence of any liquid in it - because it's empty.
> >
> >Sortof like an empty set, eh?
> >
> >> And that fact can truthfully be represented in a column MugEmpty with
> >> domain {'Yes', 'No'}, or in a column MugContents with a numeric domain
> >> that stores the amount of liquid (in cc) in your mug.
> >>
> >>
> >> > That
> >> >does *not* mean I don't know what's in it. I don't have to look for any
> >> >other facts. It is, very plainly and obviously, empty. If I have a rule
> >> >that I only store data (known facts) in a database then I'll know that
> >> >where I have a NULL then the NULL represents an absence of any value.
> >> >It most certainly will not signify that I have an unknown value. It
> >> >will signify that there is no value where one would normally be
> >> >expected.
> >>
> >> Exactly! That's what I've been saying all the time: NULL represents the
> >> absence of a value. Since the question "what is in your mug" can't be
> >> answer currently, there is no value to store in the cell that is
> >> reserved to hold this answer. It'll be NULL - no value here. This cell
> >> in the database is empty.
> >
> >Or if you want to have values for every cell so that you can answer
> >comparison questions, and if the dbms permits sets in cells, then
> >thinking of the cell has holding an empty set would make sense, right?
>
> *IF* the database allowed to store a set in a cell, then it might make
> sense to store an empty set for an empty mug.
I'll take that, thanks. Now move away from what the database allows us
to store to the mathematical model for what is stored. You can model
the scalar as a set with one value and the NULL as a null set. I
realize this isn't the RM model for it.
> And it would also make
> sense to store the contents of my mug as {Coffee}, but the contents of
> my wife's mug as {Coffee, Sugar}.
Yes.
> In SQL databases, sets can't be stored in a cell.
I noticed that.
> If the complete set of
> contents of the mug has to be stored, the first rule of normalisation
> requires us to use a seperate table.
Unless you are using Date's new, revised version of 1NF as discussed in
threads earlier this year. But I use your def of 1NF and continue to
be opposed to 1NF.
> My wife's mug would result in two
> rows.
Such a shame. Sometimes the data modeler will either simplify reality
and not capture all of the contents or perhaps add in attributes for
hasSugar, hasCream, hasIrishCream, ... to the first table rather than
making a second one with multiple rows for cup contents. Wouldn't it
be great if we could model this with a list in a cell? ;-)
> My mug in one row. And Mike's empty mug would get no rows at all
> in this table, since that's how SQL databases store an empty set.
>
> But the original discussion was about NULL being used to represent a
> missing *scalar*.
The 2VL I use does use null to represent a missing scalar, but, again,
a mathematical model that could be used to explain how it works could
model the scalars as sets of a single value and the null as an empty
set.
> >> > It will signify that the "thing" is empty.
> >>
> >> If "thing" refers to the cell in the database: indeed.
> >>
> >> If "thing" refers to your mug: NO. There can be many reasons why the
> >> answer to the question "what is in your mug" is not in the database.
> >> Maybe you refused to answer. Maybe I forgot to ask. Maybe you didn't
> >> know. Maybe the mug was empty.
> >
> >If I KNOW it is empty, it seems like a value that represents this
> >knowledge might be a better approach. This is easy with a char
> >attribute which could be coded with EMPTY or NA or NONE, but not as
> >easy in a strongly typed environment with numeric types.
>
> All these values (EMPTY, NA, NONE) are not in the domain of valid
> liquids. They should therefore be disallowed in this column (either by a
> CHECK constraint or by a FOREIGN KEY constraint).
If it is valid to know that a cup has zero liquid in it, then surely 0
could be in the domain of a liquidAmount attribute. Similarly, if
EMPTY is a possible value for whatever attribute you identify, then
that needs to be in the domain of the attribute, right? That is what a
domain is -- the list of possible values.
> (snip)
> >> Using
> >> NULL to say that there definitely are none might be valid in Pick (I'll
> >> just have to trust you on that, as I don't know PicK),
> >
> >no, no, I'll differ on that
>
> Since I don't know Pick, I'll have to believe everything Mike says about
> it. I'll also have to believe everything you say about it. With the
> interesting result that I am now totally believeing two mutually
> exclusive statements about Pick. <g>
You might have noticed that various RM folks disagree at times too,
right?
In this case, Mike is right that it is possible to let NULL for a
particular attribute mean that we know the value and it is "none."
>>From a data modeling standpoint, that would be frowned upon (thus my
"no" response). I'm sure that is done whenever the modeler deems it
unnecessary to distinquish between a "don't know" and "do know that
there is none" but I would discourage it.
Cheers! --dawn
.
- Follow-Ups:
- Re: 3vl 2vl and NULL
- From: Hugo Kornelis
- Re: 3vl 2vl and NULL
- References:
- 3vl 2vl and NULL
- From: David Cressey
- Re: 3vl 2vl and NULL
- From: michael
- Re: 3vl 2vl and NULL
- From: Hugo Kornelis
- Re: 3vl 2vl and NULL
- From: michael
- Re: 3vl 2vl and NULL
- From: Hugo Kornelis
- Re: 3vl 2vl and NULL
- From: michael
- Re: 3vl 2vl and NULL
- From: Hugo Kornelis
- Re: 3vl 2vl and NULL
- From: dawn
- Re: 3vl 2vl and NULL
- From: Hugo Kornelis
- 3vl 2vl and NULL
- Prev by Date: Re: So what's null then if it's not nothing?
- Next by Date: Re: So what's null then if it's not nothing?
- Previous by thread: Re: 3vl 2vl and NULL
- Next by thread: Re: 3vl 2vl and NULL
- Index(es):
Relevant Pages
|