Re: Examples of SQL anomalies?




"Cimode" <cimode@xxxxxxxxxxx> wrote in message
news:a2546fe6-6a1d-4560-a782-8cfef53e779f@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
On 5 juil, 01:52, "Brian Selzer" <br...@xxxxxxxxxxxxxxxxxxx> wrote:
"Cimode" <cim...@xxxxxxxxxxx> wrote in message

news:47943627-3b66-4fb2-950f-46359441f020@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx



On 5 juil, 00:00, "Brian Selzer" <br...@xxxxxxxxxxxxxxxxxxx> wrote:
"David Cressey" <cresse...@xxxxxxxxxxx> wrote in message

news:jSwbk.249$rb1.205@xxxxxxxxxxx

"Brian Selzer" <br...@xxxxxxxxxxxxxxxxxxx> wrote in message
news:zOvbk.31378$ZE5.24693@xxxxxxxxxxxxxxxxxxxxxxx

"David Cressey" <cresse...@xxxxxxxxxxx> wrote in message
news:D8ubk.240$rb1.211@xxxxxxxxxxx

"Brian Selzer" <br...@xxxxxxxxxxxxxxxxxxx> wrote in message
news:gI6bk.11000$cW3.8591@xxxxxxxxxxxxxxxxxxxxxxx

"David Cressey" <cresse...@xxxxxxxxxxx> wrote in message
news:wq4bk.28$0V1.10@xxxxxxxxxxx

"Brian Selzer" <br...@xxxxxxxxxxxxxxxxxxx> wrote in message
news:nFD9k.5753$LG4.2422@xxxxxxxxxxxxxxxxxxxxxxx

"-CELKO-" <jcelko...@xxxxxxxxxxxxx> wrote in message

news:f219a6bd-9d8e-4cfe-9d60-ce9dcaeff16d@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
The question is, if these issues are due to the SQL
specification
or
simply due to a problem in a specific SQL product. Or
could
it
be,
that
the definition is not precise enough in some points, so
that
database
vendors implemented it differently? <<

Nope, it is the specs. All aggregate (set) functions begin
by
removing the NULLs from their parameter set, then if there
is
a
DISTINCT option on the parameter, they remove redundant
duplicates
and
finally do the operation (MIN, MAX, AVG, SUM, COUNT on what
is
left.
Since an empty set has no elements upon which to apply an
operation,
SQL returns a NULL (okay, it should be an "undefined" if we
were
mathematically correct).

MIN, MAX and AVG are meaningless when applied to an empty
bag,
but
it
seems
to me that COUNT should always return 0 when the bag is
empty,
and
similarly, SUM should return 0. SUM should only return NULL
if
one
of
the
values to be summed is NULL.

By definition, none of the values to be summed are NULL.

OK. OK. Yes, I know. NULL is not a value. I guess to be
precise,
I
should
have said "if the cardinality of the set of rows targeted by the
query
is
greater than the number of values to be summed, then SUM should
return
NULL," but I think that would have caused more confusion.
Bottom
line:
if
not all of the amounts are known, then the total amount is
suspect.

This has to do with a continuing disagreement between you and me.
In
my
view there is no such thing as the "set of rows targeted by the
query".
There is a set of data targeted by the query. That is all. Rows
that
do
not contain any data with regard to the current query are not
part
of
the
target, by definition.

Thus, sum (SALARY) from FUBAR is the sum of a bag of salaries.
It's
not
the sum of a bag of rows from FUBAR. Rows in FUBAR that don't
contain
a
SALARY are not part of the sum. That is all.

What about sum (SALARY) from FUBAR where Fu = 'Bar'? Doesn't
"where
Fu =
'Bar'" restrict the query to the result of evaluating "where Fu =
'Bar'"?
Clearly if COUNT(*) where Fu = 'Bar' is greater than COUNT(*) where
Fu
=
'Bar' and SALARY is not NULL, then it is obvious that some salaries
aren't
known, hence the total of those salaries, sum (SALARY) where Fu =
'Bar'
is
suspect.

I can't understand that you don't get this since it is just so
simple.

It is simple, I do get it, and you are wrong.

How can that be? If I have ten packages that need to be shipped, but
I
only
know what eight of them weigh, then how can I print a bill of lading
that
requires the total weight for the shipment? Obviously, the sum of
just
the
eight weights will be less than what is actually shipped, so it should
be
obvious that any attempt to pass that sum off as the total weight must
be
held suspect.

How about "the system you plan on designing with NULLS does not handle
that case OR it handles only te case for the 8 weigthed items"
limitation. Best thing is extrapolation.

Best thing is to treat the sum as suspect, as I have been suggesting from
the start. Extrapolation is problematic because there is no reason to
assume that the remaining items are comparable to those whose weight has
already been supplied. Suppose for example that the eight items for
which
the weight has already been supplied are each under 2 pounds, but that
the
remaining two items each weigh well over a ton. Extrapolation would
yield
an estimate orders of magnitude in error.
I should have written..*next* best thing is extrapolation. Neither
extrapolation or flag a result as suspect are correct solutions to the
problem. In fact both of them are hacks the former being a more
sophisticated then the latter.

*suspect* has exactly the same downsides than *extrapolation* except
that it may lead to further confusion due to subjective interpretation
by users for the exact same reasons you mentionned. Why would an
output be more suspect than another. Definitely not a good idea.
Extrapolation through mathematical functions at least has the
advantage of allowing the system to use probabilism

It sounds to me like you're arguing against missing information. Missing
information is a fact of life--regardless of whether it represented as nulls
or as absent tuples. In fact, the latter requires an open-world
interpretation, since under the closed-world interpretation, if a tuple
isn't present, then the assertion it represents is false. On the other
hand, a null indicates that there should be a value but it hasn't been
supplied. It is in accord with the closed-world interpretation since the
assertion represented by an incomplete tuple is true. The bottom line is
that if information can be missing, such as the weight of a package, then
you're going to be faced with indeterminate results, regardless of how the
information or lack thereof is represented. I think *suspect* does not have
the downsides of *extrapolation* because it is the user that is guessing,
not the system. I am more inclined to believe that the user is better able
to make educated guesses than the system. Ask yourself this: would the user
be informed that the answer to his query is the result of an extrapolation?
Unlikely. If, however, the result of a sum is NULL, and if the user knows
that a NULL result means that information that would normally have gone into
that result is missing, then the user can query further to find out what is
missing and make a better educated guess, or even perhaps supply what is
missing. Why would the user bother to inquire further if he already has an
answer--even if that answer is orders of magnitude off the mark?


.



Relevant Pages

  • Re: Examples of SQL anomalies?
    ... similarly, SUM should return 0. ... not all of the amounts are known, then the total amount is suspect. ... obvious that any attempt to pass that sum off as the total weight must be ... Extrapolation is problematic because there is no reason to ...
    (comp.databases.theory)
  • Re: puzzle
    ... Determine the xor of the numbers 0...n, ... >) sequence. ... Determine the sum of the numbers 0...n, ... The difference of the two sums is the missing number. ...
    (comp.programming)
  • Re: how to find missing value.
    ... then missing number,n = Sum of N natural numbers + Sum of the numbers in Array A. ... the set of numbers would be consecutive if it were sorted, with the exception of the missing number. ...
    (perl.beginners)
  • Re: how to find missing value.
    ... > then missing number,n = Sum of N natural numbers + Sum of the numbers ... > in Array A. ... exception of the missing number. ... Work out the real sum of the values in the list: ...
    (perl.beginners)
  • Re: Changing a field setting in a pivot table
    ... I suspect you have defined a very large range for your PT, to allow for future data entry, which would then cause the PT to default to Count. ... To change all the fields from Count to Sum, then use the following code (from Debra Dalgleish) ... Dim pt As PivotTable, pf As PivotField, ws As Worksheet, i As Long ...
    (microsoft.public.excel.misc)