Re: How do i write Set based queries and avoid a cursor?
- From: --CELKO-- <jcelko212@xxxxxxxxxxxxx>
- Date: Sat, 27 Sep 2008 18:23:16 -0700 (PDT)
1. The names violate ISO-11179 rules. You actually have a column
named "name" -- could you be vaguer?? Well, "datasource" pretty
generic. Actually, you were vaguer with "entity_id" which is absurd.
The essence of an identifier is that it is specific to one and only
one set of entities. To believe in a universal, generic "entity_id"
is Kabala mystic tradition which all things in creation have a number
assigned by the Lord God of Israel. We name a data element for what
it is by its nature; not for how it is stored, or used in a particular
situation.
2. Even MS gave up on camelCasing; it is a bitch to maintain because
your eyes jump to uppercase letters then back again. Start using the
underscore to separate the attribute name from its roles and
properties. People read it in a linear fashion, thanks to a century
of ruled notebook paper.
3. Standard SQL and now parts of T-SQL require a semi-colon statement
terminator, so start using it. Another question is why did you use
extra parentheses, single-statement BEGIN-END block, commas at the
start of a line, open and close parentheses on their own line, etc.?
We did that stuff with punch cards so we could slip in more cards
later in the 1960's. We have video terminals and editors now, so stop
formatting your code as if you were is a punch card world.
4. Why did you use "_field" as part of the data element names? Fields
and columns are totally different concepts. I chopped off that file
system legacy. And why are so many things VARCHAR(50), a magic number
from ACCESS? The length of a character column is one of the most
important properties it has, unlike field who length is determined by
the application program reading the file.
5. Table variables (proprietary in T-SQL) and temporary tables (not
ANSI Standard in T-SQL) are attempts to mimic scratch tapes and are to
be avoided. Instead we have CTEs, VIEWs and derived tables. The
optimizer can use these logical virtual tables to best advantage
instead of being forced to materialize them in physical storage.
6. We do not write with flags -- that is more punch card stuff. SQL
is a predicate language. We had to set flags with magnetic tapes and
punch cards after we determined a status in one procedural step to
pass the results to the next procedural step. Hey, back then, big
computers were less powerful than your wristwatch is today. Another
problem with flags is that you have to check them before you use them,
so why not just use the predicate instead of the flag? They force you
to have tightly coupled code modules Remember that term from basic
Software Engineering? It still applies in declarative languages.
Another problem is that you quickly have to write code that plays "20
Questions" and requires insanely complex CHECK() constraints. And the
name "active" is an adjective without a noun, so I changed it to
active_flg. But we need to get rid of it completely. We need a
(start_date, end_date) pair that tells us the duration of the active
status. This is, again, basic data modeling.
7. The construct ((A RIGHT OUTER B) UNION (A LEFT OUTER JOIN B)) in
your generated code is weird for several reasons. First of all, there
is a lot of redundancy (We hate redundancy in RDBMS, like the Pope
hates Evil). Next, we dislike RIGHT OUTER JOINs in our Latin alphabet
culture. We read left to right, so having the preserved table come
later is conceptually hard. Then you did not need the SELECT DISTINCT
because the UNION will remove redundant duplicates for you; let the
optimizers decide how to do that. But the kicker is that the whole
damn mess can probably be reduced to (A FULL OUTER JOIN B), preserving
both tables. Since you did not post DDL, as per netiquette, this is
only a guess.
SELECT ..
FROM (SELECT..
FROM Datasource
WHERE gr_ent_id IS NOT NULL) -- impossible by def
AS E
FULL OUTER JOIN
(SELECT ..
FROM db1.dbo.TrainingGroups
WHERE traininggroup_set_id = tg_set_id)
AS B
ON E.gr_ent_id = B.grouping_entity_id;
8. The comment /* If any training group id is NULL, we need to create
the training group */ is insane. BY DEFINITION, an identifier is
always NOT NULL, unique and exists. That is why it is ALWAYS a key
somewhere in the schema. But we have no DDL to see how screwed up the
schema is. That creation process is another function. I invented the
catch phrase "Automobiles, Squids and Britney Spears" to describe
tables and modules that have no cohesion -- they do more than one task
or store several unrelated things which are of totally different
natures.
9. What kind of things are "TrainingGroup" and "TrainingGroupSet" in a
set-oriented language? Think about it. Is this a hierarchy or
something? You are also telling us you have one of each by the
singular names; did you mean "TrainingGroups" or what?
10. Why did you put an "IF EXIST(..)" in front of an "UPDATE.."
statement? The WHERE clause of the UPDATE will find if the set of
rows to be updated is empty; just write a good predicate. Your
mindset is still in file systems. An empty set (i.e. table) is still
a set, but a missing file or an empty file (i.e. physical directory
entry NIL pointer) is a real problem in old file systems. Also, never
use that proprietary UPDATE.. FROM.. crap; it is proprietary and has
changed semantics at least twice since it was invent decades ago by
Sybase. Learn to write good Standard SQL.
11. The "EXISTS (SELECT 1.." tells me that you learned SQL from an old
Oracle person. The modern, preferred form is "EXISTS (SELECT * ..."
instead. Before Oracle had an optimizer, this made a difference in
performance decades ago. Now, it is just a confusing antique -- the
watch pocket in blue jean of SQL. All the SQLs I know will do it the
same way, but it makes you look like a hick.
12. Please set up basic naming conventions. I can guess that "tg_" is
"Testgroup_", that "ds" is "Datasource" and "gr_" is "group_", but why
not use a text editor and be consistent? Data dictionary needs this.
A real professional is always asking the question "How can I make this
code easier for the poor dumb bastards that have to maintain it after
I get my dot-com stock options and retire in the South of France?"
13. Dynamic SQL is usually an admission that any future random
untrained user will be able to write better code than you can. It is
right up there with cursors as a kludge, but it can allow SQL
Injection and destroy the company (read about T. J. Maxx and the
credit card attack in the trade press).
14. Why did you generate "(nmg_prfx + gr_ent_name) AS "vague_name"?
The data element names change from the rest of the schema, which is
awful. But this implied that you have split an attribute across two
columns when it needs to be in one.
15. You declared a table variable named "Testgroup_Set_Matrix" which
is as bad as or worst than the "_field" in those column names. First,
it violates the rule that a data element is named for what it is and
not how it is stored. But you cannot store a matrix in SQL! We have
one and only one data structure in RDBMS; the table. A matrix is an
algebraic structure with math rules, transposes, inverses, eigenvalues
and lots of other properties. Words have meaning, so don't snorf your
dinglehoppers!
16. This table ought to be a CTE and that still stinks. It would look
like this:
WITH Active_Testgroup_Set (..)
AS
(SELECT traininggroup_set_id, traininggroup_set_name,
datasource, -- why is this both a table & column name?
grouping_entity_id, grouping_entity_name, naming_prefix
FROM Traininggroup_Set
WHERE active_flg = 1) --flags suck!
17. Combine UPDATEs so you touch the tables only once, something like
this:
UPDATE db1.dbo.TrainingGroups
SET active_flg
= CASE
WHEN EXISTS
(SELECT *
FROM Active_Testgroup_Set AS A
WHERE A.entity_id IS NULL -- impossible by def!
AND TrainingGroups.traininggroup_id = A.traininggroup_id)
THEN 0
WHEN EXISTS
(SELECT *
FROM Active_Testgroup_Set AS A
WHERE TrainingGroups.traininggroup_id = A.traininggroup_id
AND A.entity_id IS NOT NULL -- true by def
AND A.traininggroup_id IS NOT NULL -- true by def
AND A.active_flg = 0)
THEN 1
ELSE active_flg -- otherwise, do nothing
END;
18. I think the DDL is screwed up and that if you can fix it, the
system will perform 1-2 or more orders of magnitude better, be
smaller, and not need convoluted DML to answer queries or change the
structure of the schema.
.
- Follow-Ups:
- References:
- How do i write Set based queries and avoid a cursor?
- From: CK
- Re: How do i write Set based queries and avoid a cursor?
- From: Ed Murphy
- Re: How do i write Set based queries and avoid a cursor?
- From: CK
- Re: How do i write Set based queries and avoid a cursor?
- From: --CELKO--
- Re: How do i write Set based queries and avoid a cursor?
- From: CK
- How do i write Set based queries and avoid a cursor?
- Prev by Date: Re: SQL 2005 management studio vs SQL 2008 express
- Next by Date: Re: How To Look At Chronological Series To Find the First Values Greater Than Zero?
- Previous by thread: Re: How do i write Set based queries and avoid a cursor?
- Next by thread: Re: How do i write Set based queries and avoid a cursor?
- Index(es):