Re: So what's null then if it's not nothing?
- From: Hugo Kornelis <hugo@xxxxxxxxxxxxxxxxxxxxxx>
- Date: Wed, 30 Nov 2005 23:11:15 +0100
On Tue, 29 Nov 2005 17:36:59 +0100, Jon Heggland wrote:
>In article <1133277089.665394.263550@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
>boston103@xxxxxxxxxxx says...
>> Jon Heggland wrote:
>> [...]
>>
>> > A SELECT DISTINCT (it is really too kind to call this SQL
>> > construct "projection") eliminates duplicate NULLs, just like it
>> > eliminates duplicate values. To handwave this by saying "well, they're
>> > not *distinct*, but that doesn't mean they're *equal*" is just a bad
>> > excuse.
>>
>> It's simple really if you bother to read something about the 3vl.
>> According to the 3vl rules, whether two nulls are distinct or equal is
>> unknown which maps to false in the 2vl.
>
>By the exact same argument, duplicate NULLs should *not* be removed,
>because they are not equal. Choosing one over the other is an arbitrary
>decision; an artifact of how you formulate the SELECT DISTINCT
>operation.
Hi Jon,
I agree with this. Removing NULLs from the results of a DISTINCT
operation is weird.
But let's place this in a broader perspective. How should NULLs be
handled in a GROUP BY operation. Here you could also say that each NULL
should be a group of it's own - probably the mathematical most correct
way to handle it. But there also much to be said for the idea of
defining that in a GROUP BY, all NULLs should go in one group - like the
"Other" group you typically find in the results of a poll. Though
arguably less "correct", this approach is definitely more geared towards
practical use. I think that the ANSI committee chose the most practical
approach - and I'm glad they did, otherwise we'd all be working around
this all the time!
Back to DISTINCT. This operation is tightly related to GROUP BY. Remove
the aggregate expressions, and GROUP BY and DISTINCT even become
synonyms. That would change if DISTINCT would keep NULLs seperate from
each other. This might have been the reason for the ANSI committee to
define the results of NULLs in DISTINCT analogous to how they defined
the results of NULLs in GROUP BY.
But I still agree - it IS weird, and I believe that DISTINCT should
actually retain each NULL instead of lumping them together.
>And to place 'A' before 'B', one has to say that 'order A before B'.
Indeed: ORDER BY Column1 ASCENDING
>> > > The
>> > > SQL'92 standard says that the column value is a duplicate if it's
>> > > already present in the set of existing column non-nulls
>> >
>> > I'm not able to parse this.
>> >
>>
>> What's so hard about that ?
>
>I didn't understand the phrase "set of existing column non-nulls".
Take all rows in the table - that's the set of existing rows.
Now strip all columns not used in the UNIQUE constraint. What's left is
the set of existing column values.
Remove all rows that have NULL in one of the columns. You now have the
set of existing non-nulls.
New values have to be checked against this set.
>> Say, you have a set {1,2,3}, then you
>> get a new value 1. Clearly, you have a duplicate. Now, let's assume
>> your new value is null. You cannot say whether it's a duplicate or not
>> -- the result of set membership operation (or comparison if you prefer)
>> is unknown. The 3vl unknown is mapped to false, therefore, null is
>> *not* a duplicate.
>
>Now insert NULL again. Still no duplicate? Not according to SQL, and it
>seems you agree that that is sensible.
Correct. (Though Microsoft SQL Server get this wrong - it will reject
the second NULL - I can't call this a bug, because it appears to be by
design, but I can call and will call this design awful).
>But now take a table with a composite key (or a set of tuples, if you
>prefer). Insert (NULL, NULL). Insert (NULL, NULL) again. No duplicate.
>Insert (1, NULL). No duplicate. Insert (1, NULL) again. Duplicate! Why?
If your DB rejects the second insert, you've found a bug. (Or an awful
design <grin>).
The second (1, NULL) should be allowed; it's not a duplicate according
to the ANSI definitions.
>> > > So for NULL, the DUPLICATE predicate would return UNKNOWN (or
>> > > NULL) which is then mapped to false
>> >
>> > Indeed? AFAIK, SQL constraints are considered satisfied if they are true
>> > or if they are unknown.
>>
>> With constraints, one wants to be sure that a value is *rejected* if
>> the constraint is *NOT* satisfied.
>
>I might want to be sure that the value is accepted only if the
>constraint is satisfied. Why is one more correct than the other? I know
>which one I think is easier to state and understand.
In most cases, that's as easy as adding a NOT NULL constraint to the
column.
If that doesn't work in a specific case, then make sure that you
formulate the constraint such that it will evaluate to false for each
row you don't want to accept.
It's not a question of more or less correct, BTW. It's just a question
of how the rules are defined. But it might help to consider that *if* a
constraint would reject all rows that evaluate to false or unknown, most
CHECK constraints would implicitly disallow NULLs in many columns. The
current definition gives the DBA the control to either add or not add an
explicit NOT NULL constraint.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)
.
- Follow-Ups:
- Re: So what's null then if it's not nothing?
- From: Jon Heggland
- Re: So what's null then if it's not nothing?
- Prev by Date: Re: So what's null then if it's not nothing?
- Next by Date: Re: Overriding derived values
- Previous by thread: Re: So what's null then if it's not nothing?
- Next by thread: Re: So what's null then if it's not nothing?
- Index(es):
Relevant Pages
|
Loading