Re: Why all the max length constraints?




J M Davitt wrote:
dawn wrote:
J M Davitt wrote:

dawn wrote:

[OK, here is my next "stupid question" as I cut a path in my study of
the RM. Those teachers who just want to tell this student how ignorant
she is are welcome to sit this out as I really am hoping to
understand.]

[Quiz time: what are scalars, tuples, and relations?]

It's a quiz. These are fundamental terms in the relational model.

Yes, I know. Why do you think I could not define them? I've provided
definitions, likely for each of these, for the glossary at one time or
another, along with defs for "functions" which are relations.

I have read quite a bit about the RM (including TTM and almost all of
Date's 8e). My questions are typically about topics not covered in
such places and are all about the application and usefulness of the
theory related to databases. They stem from having had a belief that
the RM was the way to go and then seeing how much more bang for the
buck at least one non-RM-database model was for a company. So, I'm
trying to square up what I've learned in the real world with the theory
that has driven the database industry for the past couple of decades.
There is still a gap, but I'm understanding a bit more where the
implementations of the RM have taken the industry a bit astray from
what I have seen as best practices (2VL, LVAs, variable length
variables, for example).

You define scalar and I'll define tuples and relations.
btw, I think my def of relation is already in the cdt dictionary.


In SQL-DBMS's, like VSAM (and other indexed sequential files before
them) a lot of attributes are specified with max length constraints.

I don't mean to pick nits, but I don't grok "VSAM" and SQL-DBMS's.


It is not a nit pick for you to point out things you don't fully
understand.

Okay, let's be explicit: what do SQL and VSAM have to do with each
other?

Developers using each include max lengths in the metadata, given them
both that good old computer card feel where you list the attributes in
some order (recognizing order is different for cards and vsam than for
sql-dbms's) with their lengths (and types) when defining your schema.
It almost make me want to add a Hollerith code, length of 1, of course.

IIRC, VSAM provided for an OCCURS-like construction in record layouts -
but all that meant was that you could have a variable number of fields,
all of fixed width, up to some specified maximum.


Indexed sequential files were a natural progression from cards. It is
easy to see how fixed lengths arose out of working with card decks. I
wondering if the concept was simply never dropped or had a reason for
continuing.

Fine. But off-point. Isn't it true that the variable in VSAM means
a range of fixed-width fields?

There can be an OCCURS clause (in COBOL) for cardinality > 1 and there
can also be fields of arity > 1, so we lost both of those features with
SQL, but otherwise it was just defining a virtual card, which could be
80 columns once it wasn't read in or written to an actual card. The following is from memory and I haven't coded COBOL since 88, but it was something like

01 PERSON.
03 PERSON_ID PIC X(9).
03 NAME.
05 FIRST_NAME PIC X(15).
05 LAST_NAME PIC X(20).
03 GENDER PIC X.
03 AGE PIC 999.

Which might become
CREATE TABLE PERSON
PERSON_ID CHAR(9),
FIRST_NAME CHAR(15),
LAST_NAME CHAR(20),
GENDER CHAR(1),
AGE INT(3);

The changes for the SQL table were the removal of attributres with
higher cardinality or arity than 1 and the idea that the order was not
important. Otherwise, these look rather similar and unlike languages
where you need not define a max length for every variable. IBM played a
major role with both too, which might also be irrelevant.

MicroData, Pick,
Progress, &c, mean very different things when they describe things as
variable.


The first two should mean the same if you are referring to MicroData
Reality. There are some good stories on the splitting of ways between
*** Pick and Don Fuller, but yes, I'm sure that "variable" is used for
different things.

And doesn't their use of the term variable differ from the use of the
term in VSAM? And don't you want to clear this up before moving
forward? After all, you tossed VSAM and "all data as variable in
length" into the discussion about fixed-width attributes. (Or, more
precisely, you used the phrase "max length constraints."

I'm not sure what is not clear. I used the term "max length
constraints" so that it was clear what I meant by variable length
(being the opposite where there are no such maximum length constraints
on the data values). Sorry I'm not understanding your question. Did
this clear it up?

While there are some attributes where this constraint is related to a
conceptual constraint (from the analysis phase), these lengths are
often introduced for the logical model or implemenation in the DBMS.

In other words, when mapping from the conceptual (analysis) to the
logical (design) data models (pick the terms you like best for these),
these constraints are designed for many attributes that have no such
conceptual/business limits (if implemented with a paper system, there
would be no such limit, for example).

Is there something about the RM that would prompt all (or most?)
existing implementations (however flawed) to drive developers to add in
these constraints for performance, space saving, or other reasons?

No.


I
realize there can be variable length attributes, but specifying a max
field length still seems to be the norm (is that still the case?)

Yes.


As many of you know, I work with database management systems that treat
all data as variable in length, while one might specify a length for
display purposes.

I can't imagine that it's useful for 'Smith, Joseph' and 'Smith, John'
to appear as identical values when, say, displayed in a field of eight
characters.


Obviously not if truncated. The default with Pick would be to wrap in
any display. This is usually good, but has its own issues too, of
course. For every solution...


I also work with products where all data are of variable
length. (There is a maximum, but it's huge.) PITA. This mis-feature
accounts for a fair number of support calls.


What is the best mitigation strategy in your case? If you know that an
attribute must have no more than two characters, what do you do?

Restrict it to one or two characters, obviously. Wouldn't you?

Yup, so what's the problem that becomes a PITA and what makes it
difficult to correct or mitigate it to avoid the support calls?


Thanks for any insights into database attribute length constraints,
their purpose (is it related to output with fixed fonts, database
performance, size or what?), and any correlation to implementations
(again, however flawed) of the RM,

The issue has nothing to do with the relational model.


I know nothing of how dbms products are designed internally. I have
this idea that possibly the fact that everything in XML, Pick, and
elsewhere is ordered gives them more of a likelihood of permitting
variable length and handling it well since you can tag-delimit values.


Is there possibly something about how RDBMS's would need to allocate
space for columns or something that might make it advantageous for
there to be a max length? The other thought I had was that working
with sets and joins might mean that implementing with everything
variable length could be problematic for allocation of space somehow,
but I (obviously) don't know.

But, you know, there is always a "max length." It may be a large
value, but it's there.

Yes, understood. My question is about the logical data model (pick
your favorite name for it) -- the definition of the schema to the
computer. Why give a max length to an attribute that doesn't
conceptually require such? Why can't the DBMS handle that for you
(efficiently, of course)?

IIRC, 32K is common in the products I cited
in my earlier post. Can you verify that?

if there is such. Could a vendor

write an implementation of the RM where length constraints are as rare
as they are in the conceptual model without introducing performance or
any other issues?

There are already many products that can do that. But adding the
phrase "without introducing performance or other issues" is naive.
You should know better; everything has a cost.

I'm referring to run-time issues. Of course there would be a cost to
change a product. At run-time, fewer constraints could be a
performance boost. I can't think of any reason why MV systems would be
faster if there were max length constraints in the DBMS, for example.
Your answer makes me think there is something about the way RM
implementations (or approximations thereof) operate that is enhanced by
knowing how many card columns are required, thereby prompting
developers to specify this physical design and limit the implementation
unnecessarily (in ways the conceptual model need not).

Is your answer to this question "yes"? If so, why hasn't anyone done
so? It seems it would help with the goal of decoupling the physical
from the logical. If there is no conceptual reason for a limit,
couldn't the dbms take care of such physical issues?

This is not what others are referring to when they mention separation
between logical and physical design.

I know ;-)


I'm looking right now at all of the aspects of a conceptual design that
need to be adjusted in what I was calling the logical (data) design
(the one implemented by means of a person or program interfacing with
the DBMS) and how such designs are done differently using the RM (or an
implementation thereof, however flawed) or MV.


That's a step down the wrong path. How could any subsequent design
effort change the conceptual model? It's either complete and correct
-- or it isn't.

No, no, I'm mapping the conceptual model for the purpose of
implementation. The conceptual model, the business requirements, might
not have a limit on the size of a color attribute, for example, but
then in the "implementation model" or what I and others have called the
"logical data model" we add in a length of 12 characters, for example.

Did that help clarify? --dawn

.


Loading