Re: Suggestions for refactoring unusual tables



No need to throw a perfectly good design out because of programmer incompetence. Better throw the programmer away :-) <<

Throw away the idiot who did the EAV. This is not a good design -- in
fact, it is not a design at all. It is a metadata framework which is
lacking in any kind of data integrity or performance.

If the programming is done right, and if data-manipulation is done through a few carefully tested procedures, like

catalog_id = catalog_add(catalog,dshort,dlong)
catalogEntry_id
=catalogEntry_add(catalog_id,systext,dshort,dlong,rank)
getCatalogByName(catalogname)

etc, orphans shouldn't happen. <<

Wrong, as I know all too well from experience. Where do you prevent
an improper reference in the data (i.e. a postal code that does not
exist)? How do you maintain data integrity (we have to cascade a
delete or update on a postal code).

Also, it appears to me that catalog and catalog_entry is only populated during the development, not later. Later only dshort and dlong is changed. <<

Where did you get that spec? How do prevent inserts, updates and
deletes to catalog and catalog_entry?

Why doesn't it look good? Now, you understand what's going on, can you give a reason for not using EAV? <<

I found an old "cut & paste". Someone like you posted this:

CREATE TABLE EAV -- no key declared
(key_col VARCHAR (10), -- what does null mean?
attrib_value VARCHAR (50)); -- what does null mean?

INSERT INTO EAV VALUES ('LOCATION', 'Bedroom');
INSERT INTO EAV VALUES ('LOCATION', 'Dining Room');
INSERT INTO EAV VALUES ('LOCATION', 'Bathroom');
INSERT INTO EAV VALUES ('LOCATION', 'courtyard');
INSERT INTO EAV VALUES ('EVENT', 'verbal aggression');
INSERT INTO EAV VALUES ('EVENT', 'peer');
INSERT INTO EAV VALUES ('EVENT', 'bad behavior');
INSERT INTO EAV VALUES ('EVENT', 'other');

CREATE TABLE EAV_DATA --no constraints, defaults, DRI
(id INTEGER IDENTITY (1, 1) NOT NULL, --vague names
bts_id INTEGER NULL,
key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL );

INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior');
INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression');
INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer');

Ideally, the result set of the query would be Location Event count
(headings if possible)

Bedroom verbal aggression 1
Bedroom peer 0
Bedroom bad behavior 0
Bedroom other 2
Dining Room verbal aggression 0
Dining Room peer 0
Dining Room bad behavior 0
Dining Room other 0
Bathroom verbal aggression 0
Bathroom peer 0
Bathroom bad behavior 0
Bathroom other 0
courtyard verbal aggression 0
courtyard peer 1
courtyard bad behavior 0
courtyard other 1

Also, if possible, another query would return this result set. (I
think I know how to do this one.)

Location Event count
Bedroom verbal aggression 1
Bedroom other 2
courtyard peer 1
courtyard other 1

Here is an answer from Thomas Coleman

SELECT Locations.locationvalue, Events.eventvalue,
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue,
EventData.eventvalue
FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event'
) AS EventData
ON LocationData.bts_id = EventData.bts_id
) AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events
ORDER BY Locations.locationvalue, Events.eventvalue ,
SELECT Locations.locationvalue, Events.eventvalue
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue,
EventData.eventvalue
FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event') AS EventData
ON LocationData.bts_id = EventData.bts_id)
AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue)
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events;

Is the same thing in a proper schema as:

SELECT L.locationvalue, E.eventvalue, COUNT(*)
FROM Locations AS L, Events AS E
WHERE L.btd_id = E.btd_id
GROUP BY L.locationvalue, E.eventvalue;

The reason that I had to use so many subqueries is that those entities
are all plopped into the same table. There should be separate tables
for Locations and Events. Now write this same thing for a 8 table
query.

The column names are seriously painful. Don't use reserved words like
"key" and "value" for column names; that is metadata. It means that
the developer *has* surround the column name with double quotes for
everything. And they are too vague to be data element names anyway!

There is such a thing as "too" generic. There has to be some structure
or everything becomes nothing more than a couple of tables called
"things". The real key (no pun intended) is commonality. Is there a
pattern to the data that they want to store? It may not be possible to
create one structure to rule them all and in the darkness bind them.

"To be is to be something in particular; to be nothing in particular
is to be nothing." --Aristotle

All data integrity is destroyed. Any typo becomes a new attribute or
entity. Entities are found missing attributes, so all the reports are
wrong.

Try to write a single CHECK() constraint that works for all the
attributes of those 30+ entities your users created because you were
too dumb or too lazy to do your job. It can be done! You need a case
expression almost 70 WHEN clauses for a simple invoice and order
system when I tried it as an exercise.

Try to write a single DEFAULT clause for 30+ entities crammed into one
column. Impossible!

Try to set up DRI actions among the entities. If you thought the WHEN
clauses in the single CASE expression were unmaintainable, wait until
you see the "TRIGGERs from Hell" -- Too bad that they might not fit
into older SQL Server which had some size limits. Now maintain it.

For those who are interested, there are couple of links to articles I
found on the net:

Generic Design of Web-Based Clinical Databases
http://www.jmir.org/2003/4/e27­/

The EAV/CR Model of Data Representation
http://ycmi.med.yale.edu/nadka­rni/eav_CR_contents.htm

An Introduction to Entity-Attribute-Value Design for Generic
Clinical Study Data Management Systems
http://ycmi.med.yale.edu/nadka­rni/Introduction%20to%20EAV%20­systems.htm

Data Extraction and Ad Hoc Query of an Entity— Attribute— Value
Database
http://www.pubmedcentral.nih.g­ov/articlerender.fcgi?tool=pub­med&pubme...

Exploring Performance Issues for a Clinical Database Organized Using
an Entity-Attribute-Value Representation
http://www.pubmedcentral.nih.g­ov/articlerender.fcgi?tool=pub­med&pubme...

A really good horror story about this kind of disaster is at:

http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/

All of these 63 tables would look exactly alike; <<

Not in a properly designed schema. Each table will model one and only
set of entities or one and only one relationship. Identical tables
would be another common error -- attribute splitting. This is basic
stuff!!

Would you also create 63 routines for editing each of the 63 tables? <<

I work with properly designed schemas that have hundred tables.
Thanks to declarative DRI actions, DEFAULT, and CHECK() constraints,
most of the work is done by the SQL engine itself. But, yes we do
have more than one procedure to maintain something like the database
for General Motors, a major hospital, etc.
.



Relevant Pages

  • Re: Dynamic data elements for a data collection application
    ... > INSERT INTO EAV VALUES; ... > Bedroom verbal aggression 1 ... > Dining Room verbal aggression 0 ... > courtyard verbal aggression 0 ...
    (comp.databases.ms-sqlserver)
  • Re: Problem with views switching columns
    ... This is called EAV design. ... Design for Generic Clinical Study Data Management Systems" by Prakash ... Similarly, most analytical programs, such as ...
    (microsoft.public.sqlserver.programming)
  • Re: Modelling objects with variable number of properties in an RDBMS
    ... EAV tables. ... I cannot provide more detail than above since the design ... I can think of lots of reasons why ... In fairness it was a saying that summed up an important point. ...
    (comp.databases.theory)
  • Re: Flexible columns
    ... Every 2-3 months a newbie re-discovers the EAV design fallacy! ... attrib_value VARCHAR NULL); ... Dining Room verbal aggression 0 ... courtyard verbal aggression 0 ...
    (comp.databases)
  • Re: Tag/value based database
    ... You couldn't find what a disaster EAV is with a quick Google? ... Bedroom verbal aggression 1 ... Dining Room verbal aggression 0 ... courtyard verbal aggression 0 ...
    (comp.databases)

Loading