GROUP BY
- From: Marshall <marshall.spight@xxxxxxxxx>
- Date: 18 May 2007 11:32:46 -0700
GROUP BY works on a single relation. The input relation's attributes
are partitioned into one of three categories:
1) those we ignore
2) those we group by
3) those we apply the function to
The result relation will have attributes from 2) above, and in
addition:
4) those created as a result of the applied function.
The functions mentioned in 4) may only use as arguments the attributes
listed in 3).
So for example given R(a, b, c):
select b, sum(c) as d from R group by b
will ignore attribute(s) a, group by attribute(s) b, and apply
a function to attribute(s) c, yielding the additional attribute d.
The result will have attributes (b, d).
One of my favorite lessons from TTM was always to consider the 0-arity
case.
It is not unusual to consider the case of 1) or 2) being empty.
In the first case, we are either grouping by or aggregating over
all attributes. In the second, we aren't grouping, but are just
aggregating over the entire relation; the result will have
a single row.
What if 3) is empty?
In that case, GROUP BY is equivalent to PROJECT!
SQL obscures this fact, due to the templated nature of a SQL query.
The 3)-empty case is:
select b from R group by b
which would just be simplified to:
select b from R
which is how you project in SQL already.
What if 4) uses not an aggregate function, but just a regular
function?
In that case, GROUP BY is equivalent to APPLY! (What TTM calls
"COMPOSE".)
Recall the example from TTM, from the section "Treating Operators as
Relations" (ch. 4).
TWO_AND_TWO <COMPOSE> PLUS
You may immediately object that this introduces the necessity of
duplicate elimination, but <COMPOSE> already had this requirement.
(If you have a set of x values, it is not the case that every f(x)
will be distinct, unless f is one-to-one.)
I have been thinking about the "reverse" of aggregate functions,
if you will: "functions" that take a single input but produce multiple
outputs. They are called "generators" in some languages. (They are not
strictly functions.) It has lately occurred to me that perhaps a
single construct could contain all of project, apply, group by, and
a mechanism for applying generators.
To my shock, reviewing ch. 4 for this note, I found at the bottom
of page 53, TTM says "One implication of this example is that a
relational language such as D might reasonably include an extended
form of EXTEND that--unlike the traditional EXTEND--is not
necessarily limited to producing aexactly one output tuple from
each input tuple."
So it appears I am not the first to think something like this.
Discussion welcome.
Marshall
.
- Follow-Ups:
- Re: GROUP BY
- From: David Cressey
- Re: GROUP BY
- From: David Portas
- Re: GROUP BY
- From: Marshall
- Re: GROUP BY
- From: paul c
- Re: GROUP BY
- From: Gene Wirchenko
- Re: GROUP BY
- From: David Cressey
- Re: GROUP BY
- From: Bob Badour
- Re: GROUP BY
- Prev by Date: Re: naive questions about nested intervals with Farey fractions
- Next by Date: Re: GROUP BY
- Previous by thread: Databasescheme with UML
- Next by thread: Re: GROUP BY
- Index(es):
Relevant Pages
|