Re: Examples of SQL anomalies?
- From: "Brian Selzer" <brian@xxxxxxxxxxxxxxxxxxx>
- Date: Sat, 5 Jul 2008 20:40:23 -0400
"Cimode" <cimode@xxxxxxxxxxx> wrote in message
news:a2546fe6-6a1d-4560-a782-8cfef53e779f@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
On 5 juil, 01:52, "Brian Selzer" <br...@xxxxxxxxxxxxxxxxxxx> wrote:
"Cimode" <cim...@xxxxxxxxxxx> wrote in messageI should have written..*next* best thing is extrapolation. Neither
news:47943627-3b66-4fb2-950f-46359441f020@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
On 5 juil, 00:00, "Brian Selzer" <br...@xxxxxxxxxxxxxxxxxxx> wrote:
"David Cressey" <cresse...@xxxxxxxxxxx> wrote in message
news:jSwbk.249$rb1.205@xxxxxxxxxxx
"Brian Selzer" <br...@xxxxxxxxxxxxxxxxxxx> wrote in message
news:zOvbk.31378$ZE5.24693@xxxxxxxxxxxxxxxxxxxxxxx
"David Cressey" <cresse...@xxxxxxxxxxx> wrote in message
news:D8ubk.240$rb1.211@xxxxxxxxxxx
"Brian Selzer" <br...@xxxxxxxxxxxxxxxxxxx> wrote in message
news:gI6bk.11000$cW3.8591@xxxxxxxxxxxxxxxxxxxxxxx
"David Cressey" <cresse...@xxxxxxxxxxx> wrote in message
news:wq4bk.28$0V1.10@xxxxxxxxxxx
"Brian Selzer" <br...@xxxxxxxxxxxxxxxxxxx> wrote in message
news:nFD9k.5753$LG4.2422@xxxxxxxxxxxxxxxxxxxxxxx
"-CELKO-" <jcelko...@xxxxxxxxxxxxx> wrote in message
news:f219a6bd-9d8e-4cfe-9d60-ce9dcaeff16d@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
specificationThe question is, if these issues are due to the SQL
be,or
simply due to a problem in a specific SQL product. Or
could
it
databasethat
the definition is not precise enough in some points, so
that
vendors implemented it differently? <<
left.andNope, it is the specs. All aggregate (set) functions begin
by
removing the NULLs from their parameter set, then if there
is
a
DISTINCT option on the parameter, they remove redundant
duplicates
finally do the operation (MIN, MAX, AVG, SUM, COUNT on what
is
operation,Since an empty set has no elements upon which to apply an
SQL returns a NULL (okay, it should be an "undefined" if we
were
mathematically correct).
itMIN, MAX and AVG are meaningless when applied to an empty
bag,
but
ofseems
to me that COUNT should always return 0 when the bag is
empty,
and
similarly, SUM should return 0. SUM should only return NULL
if
one
the
values to be summed is NULL.
By definition, none of the values to be summed are NULL.
isOK. OK. Yes, I know. NULL is not a value. I guess to beshould
precise,
I
have said "if the cardinality of the set of rows targeted by the
query
greater than the number of values to be summed, then SUM should
return
NULL," but I think that would have caused more confusion.
Bottom
line:
if
not all of the amounts are known, then the total amount is
suspect.
doThis has to do with a continuing disagreement between you and me.
In
my
view there is no such thing as the "set of rows targeted by the
query".
There is a set of data targeted by the query. That is all. Rows
that
thenot contain any data with regard to the current query are not
part
of
target, by definition.
notThus, sum (SALARY) from FUBAR is the sum of a bag of salaries.
It's
the sum of a bag of rows from FUBAR. Rows in FUBAR that don't
contain
a
SALARY are not part of the sum. That is all.
What about sum (SALARY) from FUBAR where Fu = 'Bar'? Doesn't
"where
Fu =
'Bar'" restrict the query to the result of evaluating "where Fu =
'Bar'"?
Clearly if COUNT(*) where Fu = 'Bar' is greater than COUNT(*) where
Fu
=
'Bar' and SALARY is not NULL, then it is obvious that some salaries
aren't
known, hence the total of those salaries, sum (SALARY) where Fu =
'Bar'
is
suspect.
I can't understand that you don't get this since it is just so
simple.
It is simple, I do get it, and you are wrong.
How can that be? If I have ten packages that need to be shipped, but
I
only
know what eight of them weigh, then how can I print a bill of lading
that
requires the total weight for the shipment? Obviously, the sum of
just
the
eight weights will be less than what is actually shipped, so it should
be
obvious that any attempt to pass that sum off as the total weight must
be
held suspect.
How about "the system you plan on designing with NULLS does not handle
that case OR it handles only te case for the 8 weigthed items"
limitation. Best thing is extrapolation.
Best thing is to treat the sum as suspect, as I have been suggesting from
the start. Extrapolation is problematic because there is no reason to
assume that the remaining items are comparable to those whose weight has
already been supplied. Suppose for example that the eight items for
which
the weight has already been supplied are each under 2 pounds, but that
the
remaining two items each weigh well over a ton. Extrapolation would
yield
an estimate orders of magnitude in error.
extrapolation or flag a result as suspect are correct solutions to the
problem. In fact both of them are hacks the former being a more
sophisticated then the latter.
*suspect* has exactly the same downsides than *extrapolation* except
that it may lead to further confusion due to subjective interpretation
by users for the exact same reasons you mentionned. Why would an
output be more suspect than another. Definitely not a good idea.
Extrapolation through mathematical functions at least has the
advantage of allowing the system to use probabilism
It sounds to me like you're arguing against missing information. Missing
information is a fact of life--regardless of whether it represented as nulls
or as absent tuples. In fact, the latter requires an open-world
interpretation, since under the closed-world interpretation, if a tuple
isn't present, then the assertion it represents is false. On the other
hand, a null indicates that there should be a value but it hasn't been
supplied. It is in accord with the closed-world interpretation since the
assertion represented by an incomplete tuple is true. The bottom line is
that if information can be missing, such as the weight of a package, then
you're going to be faced with indeterminate results, regardless of how the
information or lack thereof is represented. I think *suspect* does not have
the downsides of *extrapolation* because it is the user that is guessing,
not the system. I am more inclined to believe that the user is better able
to make educated guesses than the system. Ask yourself this: would the user
be informed that the answer to his query is the result of an extrapolation?
Unlikely. If, however, the result of a sum is NULL, and if the user knows
that a NULL result means that information that would normally have gone into
that result is missing, then the user can query further to find out what is
missing and make a better educated guess, or even perhaps supply what is
missing. Why would the user bother to inquire further if he already has an
answer--even if that answer is orders of magnitude off the mark?
.
- Follow-Ups:
- Re: Examples of SQL anomalies?
- From: Cimode
- Re: Examples of SQL anomalies?
- References:
- Re: Examples of SQL anomalies?
- From: David Cressey
- Re: Examples of SQL anomalies?
- From: Brian Selzer
- Re: Examples of SQL anomalies?
- From: David Cressey
- Re: Examples of SQL anomalies?
- From: Brian Selzer
- Re: Examples of SQL anomalies?
- From: David Cressey
- Re: Examples of SQL anomalies?
- From: Brian Selzer
- Re: Examples of SQL anomalies?
- From: Cimode
- Re: Examples of SQL anomalies?
- From: Brian Selzer
- Re: Examples of SQL anomalies?
- From: Cimode
- Re: Examples of SQL anomalies?
- Prev by Date: Re: Examples of SQL anomalies?
- Next by Date: Re: Examples of SQL anomalies?
- Previous by thread: Re: Examples of SQL anomalies?
- Next by thread: Re: Examples of SQL anomalies?
- Index(es):
Relevant Pages
|