GROUP BY



GROUP BY works on a single relation. The input relation's attributes
are partitioned into one of three categories:

1) those we ignore
2) those we group by
3) those we apply the function to

The result relation will have attributes from 2) above, and in
addition:

4) those created as a result of the applied function.

The functions mentioned in 4) may only use as arguments the attributes
listed in 3).

So for example given R(a, b, c):

select b, sum(c) as d from R group by b

will ignore attribute(s) a, group by attribute(s) b, and apply
a function to attribute(s) c, yielding the additional attribute d.
The result will have attributes (b, d).

One of my favorite lessons from TTM was always to consider the 0-arity
case.

It is not unusual to consider the case of 1) or 2) being empty.
In the first case, we are either grouping by or aggregating over
all attributes. In the second, we aren't grouping, but are just
aggregating over the entire relation; the result will have
a single row.


What if 3) is empty?

In that case, GROUP BY is equivalent to PROJECT!

SQL obscures this fact, due to the templated nature of a SQL query.
The 3)-empty case is:

select b from R group by b

which would just be simplified to:

select b from R

which is how you project in SQL already.


What if 4) uses not an aggregate function, but just a regular
function?

In that case, GROUP BY is equivalent to APPLY! (What TTM calls
"COMPOSE".)

Recall the example from TTM, from the section "Treating Operators as
Relations" (ch. 4).

TWO_AND_TWO <COMPOSE> PLUS

You may immediately object that this introduces the necessity of
duplicate elimination, but <COMPOSE> already had this requirement.
(If you have a set of x values, it is not the case that every f(x)
will be distinct, unless f is one-to-one.)

I have been thinking about the "reverse" of aggregate functions,
if you will: "functions" that take a single input but produce multiple
outputs. They are called "generators" in some languages. (They are not
strictly functions.) It has lately occurred to me that perhaps a
single construct could contain all of project, apply, group by, and
a mechanism for applying generators.

To my shock, reviewing ch. 4 for this note, I found at the bottom
of page 53, TTM says "One implication of this example is that a
relational language such as D might reasonably include an extended
form of EXTEND that--unlike the traditional EXTEND--is not
necessarily limited to producing aexactly one output tuple from
each input tuple."

So it appears I am not the first to think something like this.

Discussion welcome.


Marshall

.



Relevant Pages

  • Re: GROUP BY
    ... One of my favorite lessons from TTM was always to consider the 0-arity ... SQL obscures this fact, due to the templated nature of a SQL query. ... What if 4) uses not an aggregate function, ... a mechanism for applying generators. ...
    (comp.databases.theory)
  • Re: GROUP BY
    ... One of my favorite lessons from TTM was always to consider the 0-arity ... we are either grouping by or aggregating over ... SQL obscures this fact, due to the templated nature of a SQL query. ... necessarily limited to producing aexactly one output tuple from ...
    (comp.databases.theory)
  • Re: So whats null then if its not nothing?
    ... >> paul c wrote: ... > the case for TABLE_DEE which TTM sees as just a value equivalent to TRUE.) ... There is no way, even in SQL, for the aggregate function ... as cnt from t1 returns one row with one attribute 'cnt'. ...
    (comp.databases.theory)
  • Re: Reduce To Ranges aggregate function
    ... Michel Cadot wrote: ... | numbers to a reduced form. ... I'm thinking if there were an aggregate function that would ... SQL> select id from t order by id; ...
    (comp.databases.oracle.misc)
  • Re: group and sort by month / site
    ... "Gary Walter" wrote: ... "part of an aggregate function" because ... From your SQL, it should look like: ... "Emelina Bumsquash" wrote ...
    (microsoft.public.access.queries)