Re: So what's null then if it's not nothing?



In article <1133286593.699982.59590@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
boston103@xxxxxxxxxxx says...
>
> Jon Heggland wrote:
> > By the exact same argument, duplicate NULLs should *not* be removed,
> > because they are not equal. Choosing one over the other is an arbitrary
> > decision; an artifact of how you formulate the SELECT DISTINCT
> > operation.
>
> No, the argument is not exactly the same. We have stipulated two
> independent rules of the game:
>
> a. two nulls are not distinct; b. two nulls are not equal

a. Projection removes all but one of tuples that are not distinct.
b. Projection retains tuples so that all are not equal to any other.

Pick either one, but I can't see how you can claim that one is more
correct than the other, except for the reason that Codd said so.

> In the 3VL, neither implies the other.

They don't in 2VL either; the *negation* of one implies the other. And I
don't see how this in any way invalidates my argument. Codd's projection
rule is based on this:

"In applying this nonduplication rule, a null value in one tuple is
regarded as the same as a null value in another. This identification of
one null value with another may appear to be in contradiction with our
assignment of truth value to the test w = w [NULL = NULL]. However,
tuple identification for duplicate removal is an operation at a lower
level of detail than equality testing in the evaluation of retrieval
conditions. Hence, it is possible to adopt a different rule."

I don't buy this "lower level of detail" thing. It is taken out of thin
air; handwaving to resolve a glaring inconsistency.

> Whether the rules make
> *practical sense* is another matter entirely ;)

Come on. That is exactly the matter *I* am discussing, at least.

> > And this "unknown maps to false"---where do you get that from?
>
> From the same article I've mentioned multiple times. E.g. select
> produces only the tuples for which the select predicate evaluates to
> true. In the 2VL, it means effectively that unknown is treated as
> false.

Codd never uses the phrase "maps to false", and I think it is dangerous
and confusing to do so. E.g. it could be argued that since unknown "maps
to false", "not unknown" maps to "not false", I.e. true. You say
constraints leads to tuples being rejected if false; this is equivalent
to saying that constraints are always true or unknown (of course, "or"
here is not the logical operator)---if you want to "map" this to 2VL, it
makes as much sense to say that constraints are always true, and unknown
maps to true.

> > > Oh yes. Ascending or descending has got nothing to do with null
> > > 'ordering'. In order to place nulls at the top/bottom of a result set,
> > > one has to say 'order by nulls first/last' with 'last' being the
> > > default thus creating a user mapping that converts nulls to the
> > > least/greatest value.
> >
> > And to place 'A' before 'B', one has to say that 'order A before B'.
>
> I do not understand what you are trying to say here. There is some
> natural ordering defined on various data types , that's what 'order
> by' uses. There is obviously no such ordering amongst nulls,
> therefore one has to specify ordering manually, using words like nulls
> last or nulls first.

And for the user, at the logical level, it seems that the sorting
algorithm compared null to 'A', and determined that it was less. To
define a natural ordering on a data type, and to specify ordering
manually---what is the difference?

> > > Say, you have a set {1,2,3}, then you
> > > get a new value 1. Clearly, you have a duplicate. Now, let's assume
> > > your new value is null. You cannot say whether it's a duplicate or not
> > > -- the result of set membership operation (or comparison if you prefer)
> > > is unknown. The 3vl unknown is mapped to false, therefore, null is
> > > *not* a duplicate.
> >
> > Now insert NULL again. Still no duplicate?
>
> We *do not know* so the rule of the game is to be optimistic and hope
> it is not a duplicate.

It may be that we are talking past each other. You seem to be explaining
what the SQL rules *are*, while I am questioning if they are sensible---
and an appeal to "the rule of the game" does not cut it for me. I'm not
comfortable with a logical system where { null, null } is a set.

> >Not according to SQL,
>
> What do you mean ? The SQL unique constraint does not consider two
> nulls as duplicates.

I know! But does it make sense?

> > But now take a table with a composite key (or a set of tuples, if you
> > prefer). Insert (NULL, NULL). Insert (NULL, NULL) again. No duplicate.
> > Insert (1, NULL). No duplicate. Insert (1, NULL) again. Duplicate! Why?
>
> Because, the implementation(s) deviates from the standard (for a lot
> of practical reasons tha may or may not make sense to everyone).

Fair enough.

> > > With constraints, one wants to be sure that a value is *rejected* if
> > > the constraint is *NOT* satisfied.
> >
> > I might want to be sure that the value is accepted only if the
> > constraint is satisfied. Why is one more correct than the other? I know
> > which one I think is easier to state and understand.
>
> Then, in your pessimistic database world, you'd reject all the record
> for people with unknown phone number, unknown age, unknown middle
> name, etc.

Not at all, and I find it very strange that you draw this unfounded
conclusion. Let me explain.

Say we have a 3VL database system with NULLs, with constraints required
to evaluate to true. I have a table of people and their age, and age can
be unknown, represented by NULL. And I have a restriction, saying "age >
10". In my world, a tuple with NULL age would indeed be rejected, as I
expect it too---after all, NULL is not greater than 10. However, if I
want to be able to put unknown ages in my table, I would have the
constraint say "age is NULL or age > 10". This explicitly says that age
may be unknown; if it is not, it must be greater than 10.

These two approaches (you may call them the pessimistic and the
optimistic) have exactly the same expressive power. Therefore, none of
them is more correct than the other. But I consider the pessimistic
approach more intuitive and easier to understand. After all, SQL
constraints are expressed as a positive predicate checking that some
condition is satisfied, e.g. "CHECK age > 10". If the syntax instead had
been negative, e.g. "REJECT IF age <= 10", the optimistic approach
(allowing NULLs by default) would be more intuitive.

In fact, all database textbooks I have read (e.g. Elmasri and Navathe;
Garcia-Molina, Ullman and Widom), describe constraints as "assertions"
or "conditions" that "must hold" or "must hold true". With regard to
SQL, this is actually incorrect; they must not not (sic) hold, which is
a different thing in 3VL. It breeds confusion, and I think SQL is to
blame, not the textbooks.

(By the way, does Codd say anything about whether constraints must be
true, or merely not be false? I can't find anything in the article.)

> > Create a table with an attribute A, and a constraint that says that A
> > must be greater than five. Insert ten rows successfully. Select the rows
> > where A is greater than five. How many rows do you get?
>
> I'll get as many rows as there are satisfying the greater than 5
> predicate. There might be some rows with nulls, but I won't not get
> those (which should be evident based on our discussion).

Yes, and I believe this comes as a big surprise for people who have not
followed our discussion, and not delved into the the intricacies of the
SQL standard; but instead have taken at face value the obvious notion
that constraints are meant to hold. Always.

Not to mention that it must bug the hell out of the query optimizer.

> > Ok. Now, with 2VL, the unique constraint behaviour does *not* depend on
> > how you define it, and there is no decision to arbitrarily make. It is
> > clear and simple, and doesn't surprise anyone. I think it is better.
>
> Then, do not use nulls and all the real or perceived problems will go
> away.

Except that SQL has a tendency to produce NULLs whether I want to or
not, and the design and implementation of SQL systems are more
complicated, error-prone and slow than they need to.

So, the solution to SQL's problems is to not use SQL. Fair enough, but
not a very interesting conclusion.
--
Jon
.



Relevant Pages

  • Re: Why does 1:3 relationsihp require another table?
    ... practically difficult if not impossible and that is why NULLs or any such ... If representation of unknown value is the idea behind the ... implementation of NULLs in SQL, then SQL avoids far too many kinds of ... By assuming support for just one kind of missing information (Unknown value ...
    (microsoft.public.sqlserver.programming)
  • Re: When is a NULL Not a NULL ?
    ... I would always *try* to avoid them, Nulls that is, with careful database ... you are speaking only of a convention among SQL users. ... something that is unknown. ...
    (microsoft.public.access.adp.sqlserver)
  • Re: So whats null then if its not nothing?
    ... Does SQL's definition of null (unknown) include the null ... > whether it's null (as in an empty string) or not. ... Part of the problem with undertsanding nulls is that conceptually SQL ...
    (comp.databases.theory)
  • Re: So whats null then if its not nothing?
    ... Does SQL's definition of null (unknown) include the null ... > whether it's null (as in an empty string) or not. ... Part of the problem with undertsanding nulls is that conceptually SQL ...
    (comp.databases.theory)
  • Re: Null and not equal in Join Condition
    ... Hi Robert, ... NULLS or not. ... Any comparison to a NULL value will always be "unknown". ... equal to a value that's not in the database. ...
    (microsoft.public.sqlserver.programming)